Results 1  10
of
16
Offline planning for communication by exploiting structured interactions in decentralized mdps
 In Web Intelligence and Intelligent Agent Technologies, 2009. WIIAT’09. IEEE/WIC/ACM International Joint Conferences on
, 2009
"... Variants of the decentralized MDP model focus on problems exhibiting some special structure that makes them easier to solve in practice. Our work is concerned with two main issues. First, we propose a new model, EventDriven Interaction with Complex Rewards, that addresses problems having structured ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
Variants of the decentralized MDP model focus on problems exhibiting some special structure that makes them easier to solve in practice. Our work is concerned with two main issues. First, we propose a new model, EventDriven Interaction with Complex Rewards, that addresses problems having structured transition and reward dependence. Our model captures a wider range of problems than existing structured models. In spite of its generality, the model still offers structure that can be leveraged by heuristics and solution algorithms. This is facilitated by explicitly representing interactions as firstclass entities. We formulate and solve instances of our model as bilinear programs. Second, we look at making offline planning for communication tractable. To this end, we propose heuristics that limit problem size by making communication available only at a few strategically chosen points based on an analysis that exploits problem structure in the proposed model. Experimental results demonstrate a reduction in problem size and solution time using restricted communication, with little or no decrease in solution quality. Our heuristics therefore allow us to solve problems that would otherwise be intractable. 1
Decentralized Control of Partially Observable Markov Decision Processes
"... Abstract — Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or co ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
(Show Context)
Abstract — Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or corruption). This paper surveys recent work on decentralized control of MDPs in which control of each agent depends on a partial view of the world. We focus on a general framework where there may be uncertainty about the state of the environment, represented as a decentralized partially observable MDP (DecPOMDP), but consider a number of subclasses with different assumptions about uncertainty and agent independence. In these models, a shared objective function is used, but plans of action must be based on a partial view of the environment. We describe the frameworks, along with the complexity of optimal control and important properties. We also provide an overview of exact and approximate solution methods as well as relevant applications. This survey provides an introduction to what has become an active area of research on these models and their solutions. I.
Scaling up decentralized MDPs through heuristic search
 In UAI
, 2012
"... Decentralized partially observable Markov decision processes (DecPOMDPs) are rich models for cooperative decisionmaking under uncertainty, but are often intractable to solve optimally (NEXPcomplete). The transition and observation independent DecMDP is a general subclass that has been shown to h ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) are rich models for cooperative decisionmaking under uncertainty, but are often intractable to solve optimally (NEXPcomplete). The transition and observation independent DecMDP is a general subclass that has been shown to have complexity in NP, but optimal algorithms for this subclass are still inefficient in practice. In this paper, we first provide an updated proof that an optimal policy does not depend on the histories of the agents, but only the local observations. We then present a new algorithm based on heuristic search that is able to expand search nodes by using constraint optimization. We show experimental results comparing our approach with the stateoftheart DecMDP and DecPOMDP solvers. These results show a reduction in computation time and an increase in scalability by multiple orders of magnitude in a number of benchmarks. 1
An investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs, in "The
 Journal of Artificial Intelligence Research
"... Decentralized planning in uncertain environments is a complex task generally dealt with by using a decisiontheoretic approach, mainly through the framework of Decentralized Partially Observable Markov Decision Processes (DECPOMDPs). Although DECPOMDPS are a general and powerful modeling tool, sol ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Decentralized planning in uncertain environments is a complex task generally dealt with by using a decisiontheoretic approach, mainly through the framework of Decentralized Partially Observable Markov Decision Processes (DECPOMDPs). Although DECPOMDPS are a general and powerful modeling tool, solving them is a task with an overwhelming complexity that can be doubly exponential. In this paper, we study an alternate formulation of DECPOMDPs relying on a sequenceform representation of policies. From this formulation, we show how to derive Mixed Integer Linear Programming (MILP) problems that, once solved, give exact optimal solutions to the DECPOMDPs. We show that these MILPs can be derived either by using some combinatorial characteristics of the optimal solutions of the DECPOMDPs or by using concepts borrowed from game theory. Through an experimental validation on classical test problems from the DECPOMDP literature, we compare our approach to existing algorithms. Results show that mathematical programming outperforms dynamic programming but is less efficient than forward search, except for some particular problems. The main contributions of this work are the use of mathematical programming for DECPOMDPs and a better understanding of DECPOMDPs and of their solutions. Besides, we argue that our alternate representation of DECPOMDPs could be helpful for designing novel algorithms looking for approximate solutions to DECPOMDPs. 1
Influencebased abstraction for multiagent systems
 In AAAI
, 2012
"... This paper presents a theoretical advance by which factored POSGs can be decomposed into local models. We formalize the interface between such local models as the influence agents can exert on one another; and we prove that this interface is sufficient for decoupling them. The resulting influence ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
This paper presents a theoretical advance by which factored POSGs can be decomposed into local models. We formalize the interface between such local models as the influence agents can exert on one another; and we prove that this interface is sufficient for decoupling them. The resulting influencebased abstraction substantially generalizes previous work on exploiting weaklycoupled agent interaction structures. Therein lie several important contributions. First, our general formulation sheds new light on the theoretical relationships among previous approaches, and promotes future empirical comparisons that could come by extending them beyond the more specific problem contexts for which they were developed. More importantly, the influencebased approaches that we generalize have shown promising improvements in the scalability of planning for more restrictive models. Thus, our theoretical result here serves as the foundation for practical algorithms that we anticipate will bring similar improvements to more general planning contexts, and also into other domains such as approximate planning, decisionmaking in adversarial domains, and online learning. 1
Producing efficient errorbounded solutions for transition independent decentralized MDPs
 in Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems
, 2013
"... There has been substantial progress on algorithms for singleagent sequential decision making using partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: errorbounds and fast convergence rates. Despite significan ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
There has been substantial progress on algorithms for singleagent sequential decision making using partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: errorbounds and fast convergence rates. Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading to either poor solution quality or limited scalability. This paper presents the first approach for solving transition independent decentralized Markov decision processes (DecMDPs), that inherits these properties. Two related algorithms illustrate this approach. The first recasts the original problem as a deterministic and completely observable Markov decision process. In this form, the original problem is solved by combining heuristic search with constraint optimization to quickly converge into a nearoptimal policy. This algorithm also provides the foundation for the first algorithm for solving infinitehorizon transition independent decentralized MDPs. We demonstrate that both methods outperform stateoftheart algorithms by multiple orders of magnitude, and for infinitehorizon decentralized MDPs, the algorithm is able to construct more concise policies by searching cyclic policy graphs.
Abstracting Influences for Efficient Multiagent Coordination Under Uncertainty
"... ii ..."
(Show Context)
Robust Online Optimization of RewardUncertain MDPs
 PROCEEDINGS OF THE TWENTYSECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
"... Imprecisereward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondomina ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Imprecisereward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondominated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the number of nondominated policies is generally so large as to undermine this leverage. In this paper, we show how the quality of the approximation can be improved online by pruning/adding nondominated policies during reward elicitation, while maintaining computational tractability. Drawing insights from the POMDP literature, we also develop a new anytime algorithm for constructing the set of nondominated policies with provable (anytime) error bounds. These bounds can be exploited to great effect in our online approximation scheme.
Robust Value Function Approximation Using Bilinear Programming
"... Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, th ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, this approach provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NPhard, but this is unavoidable because the Bellmanresidual minimization itself is NPhard. We therefore employ and analyze a common approximate algorithm for bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on a simple benchmark problem. 1