Results 1  10
of
46
The Complexity of Decentralized Control of Markov Decision Processes
 Mathematics of Operations Research
, 2000
"... We consider decentralized control of Markov decision processes and give complexity bounds on the worstcase running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partiallyobservable case that allow for decentralized control are described. ..."
Abstract

Cited by 403 (47 self)
 Add to MetaCart
We consider decentralized control of Markov decision processes and give complexity bounds on the worstcase running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partiallyobservable case that allow for decentralized control are described. For even two agents, the finitehorizon problems corresponding to both of these models are hard for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomialtime algorithms. Furthermore, assuming EXP NEXP, the problems require superexponential time to solve in the worst case.
Solving transition independent decentralized Markov decision processes
 JAIR
, 2004
"... Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of thes ..."
Abstract

Cited by 107 (14 self)
 Add to MetaCart
(Show Context)
Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents ’ transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a nontrivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms. 1.
TransitionIndependent Decentralized Markov Decision Processes
, 2003
"... There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multiagent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXPhard, provides ..."
Abstract

Cited by 76 (15 self)
 Add to MetaCart
There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multiagent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXPhard, provides a partial explanation. To overcome this complexity barrier, we identify a general class of transitionindependent decentralized MDPs that is widely applicable. The class consists of independent collaborating agents that are tied together through a global reward function that depends upon both of their histories. We present a novel algorithm for solving this class of problems and examine its properties. The result is the first effective technique to solve optimally a class of decentralized MDPs. This lays the foundation for further work in this area on both exact and approximate solutions.
Bayesian sparse sampling for online reward optimization
 In ICML ’05: Proceedings of the 22nd international conference on Machine learning
, 2005
"... We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making whil ..."
Abstract

Cited by 53 (5 self)
 Add to MetaCart
(Show Context)
We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making while controlling computational cost. The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior—rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information). The outcome is a flexible, practical technique for improving action selection in simple reinforcement learning scenarios. 1.
Complexity of Planning with Partial Observability
 ICAPS 2004. Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling
, 2004
"... We show that for conditional planning with partial observability the problem of testing existence of plans with success probability 1 is 2EXPcomplete. This result completes the complexity picture for nonprobabilistic propositional planning. We also give new proofs for the EXPhardness of conditio ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
(Show Context)
We show that for conditional planning with partial observability the problem of testing existence of plans with success probability 1 is 2EXPcomplete. This result completes the complexity picture for nonprobabilistic propositional planning. We also give new proofs for the EXPhardness of conditional planning with full observability and the EXPSPACEhardness of conditional planning without observability. The proofs demonstrate how lack of full observability allows the encoding of exponential space Turing machines in the planning problem, and how the necessity to have branching in plans corresponds to the move to a complexity class defined in terms of alternation from the corresponding deterministic complexity class. Lack of full observability necessitates the use of beliefs states, the number of which is exponential in the number of states, and alternation corresponds to the choices a branching plan can make.
Greedy linear valueapproximation for factored Markov decision processes
 In Proceedings of the 18th National Conference on Artificial Intelligence
, 2002
"... Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very la ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
(Show Context)
Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved: How accurate are the approximations produced by linear programs? How hard is it to produce better approximations ? and Where do the basis functions come from? To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximationshowing that this is an inherently hard problem.
Nearly deterministic abstractions of Markov decision processes
 In AAAI2002
, 2002
"... We examine scaling issues for a restricted class of compactly representable Markov decision process planning problems. For one stochastic mobile robotics package delivery problem it is possible to decouple the stochastic localnavigation problem from the deterministic globalrouting one and to solv ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
(Show Context)
We examine scaling issues for a restricted class of compactly representable Markov decision process planning problems. For one stochastic mobile robotics package delivery problem it is possible to decouple the stochastic localnavigation problem from the deterministic globalrouting one and to solve each with dedicated methods. Careful construction of macro actions allows us to effectively “hide ” navigational stochasticity from the global routing problem and to approximate the latter with offtheshelf combinatorial optimization routines for the traveling salesdroid problem, yielding a net exponential speedup in planning performance. We give analytic conditions on when the macros are close enough to deterministic for the approximation to be good and demonstrate the performance of our method on small and large simulated navigation problems.
AlgorithmDirected Exploration for ModelBased Reinforcement Learning in Factored MDPs
 In Proceedings of the International Conference on Machine Learning
, 2002
"... One of the central challenges in reinforcement learning is to balance the exploration/exploitation tradeoff while scaling up to large problems. Although modelbased reinforcement learning has been less prominent than valuebased methods in addressing these challenges, recent progress has gener ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
One of the central challenges in reinforcement learning is to balance the exploration/exploitation tradeoff while scaling up to large problems. Although modelbased reinforcement learning has been less prominent than valuebased methods in addressing these challenges, recent progress has generated renewed interest in pursuing modelbased approaches: Theoretical work on the exploration /exploitation tradeoff has yielded provably sound modelbased algorithms such as E Rmax , while work on factored MDP representations has yielded modelbased algorithms that can scale up to large problems. Recently the benefits of both achievements have been combined in the algorithm of Kearns and Koller. In this paper, we address a significant shortcoming of Factored E : namely that it requires an oracle planner that cannot be feasibly implemented. We propose an alternative approach that uses a practical approximate planner, approximate linear programming, that maintains desirable properties. Further, we develop an exploration strategy that is targeted toward improving the performance of the linear programming algorithm, rather than an oracle planner. This leads to a simple exploration strategy that visits states relevant to tightening the LP solution, and achieves sample efficiency logarithmic in the size of the problem description. Our experimental results show that the targeted approach performs better than using approximate planning for implementing either Factored E or Factored Rmax .
Algorithms for Partially Observable Markov Decision Processes
 HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY
, 2001
"... Partially Observable Markov Decision Process (POMDP) is a general sequential decisionmaking model where the effects of actions are... ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Partially Observable Markov Decision Process (POMDP) is a general sequential decisionmaking model where the effects of actions are...
Anytime planning for decentralized POMDPs using expectation maximization
 IN UAI
, 2010
"... Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. While finitehorizon DECPOMDPs have enjoyed significant success, progress remains slow for the infinitehorizon case mainly due to the inherent complexity of optimizing stochastic controllers representi ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. While finitehorizon DECPOMDPs have enjoyed significant success, progress remains slow for the infinitehorizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinitehorizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DECPOMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the stateoftheart solvers.