Results 1  10
of
88
Solving transition independent decentralized Markov decision processes
 JAIR
, 2004
"... Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of thes ..."
Abstract

Cited by 107 (14 self)
 Add to MetaCart
(Show Context)
Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents ’ transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a nontrivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms. 1.
Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs
, 2005
"... In many realworld multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the realworld uncertainty in multiagent domains, they fail to exploit such locality ..."
Abstract

Cited by 96 (20 self)
 Add to MetaCart
In many realworld multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the realworld uncertainty in multiagent domains, they fail to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction but fails to capture planning under uncertainty. This paper present a new model synthesized from distributed POMDPs and DCOPs, called Networked Distributed POMDPs (NDPOMDPs). Exploiting network structure enables us to present two novel algorithms for NDPOMDPs: a distributed policy generation algorithm that performs local search and a systematic policy search that is guaranteed to reach the global optimal.
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 62 (26 self)
 Add to MetaCart
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Influencebased policy abstraction for weaklycoupled DecPOMDPs
 In International Conference on Automated Planning and Scheduling (ICAPS2010
, 2010
"... Decentralized POMDPs are powerful theoretical models for coordinating agents ’ decisions in uncertain environments, but the generallyintractable complexity of optimal joint policy construction presents a significant obstacle in applying DecPOMDPs to problems where many agents face many policy choi ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
(Show Context)
Decentralized POMDPs are powerful theoretical models for coordinating agents ’ decisions in uncertain environments, but the generallyintractable complexity of optimal joint policy construction presents a significant obstacle in applying DecPOMDPs to problems where many agents face many policy choices. Here, we argue that when most agent choices are independent of other agents ’ choices, much of this complexity can be avoided: instead of coordinating full policies, agents need only coordinate policy abstractions that explicitly convey the essential interaction influences. To this end, we develop a novel framework for influencebased policy abstraction for weaklycoupled transitiondependent DecPOMDP problems that subsumes several existing approaches. In addition to formally characterizing the space of transitiondependent influences, we provide a method for computing optimal and approximatelyoptimal joint policies. We present an initial empirical analysis, over problems with commonlystudied flavors of transitiondependent influences, that demonstrates the potential computational benefits of influencebased abstraction over stateoftheart optimal policy search methods.
Analyzing Myopic Approaches for MultiAgent Communication
 In Proc
, 2005
"... Choosing when to communicate is a fundamental problem in multiagent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decisiontheoretic approach to this problem that balan ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
(Show Context)
Choosing when to communicate is a fundamental problem in multiagent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decisiontheoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption—that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for valuedriven communication policies in multiagent systems. Key words: multiagent systems, decentralized MDPs, communication, decisiontheoretic planning. 1.
Lossless clustering of histories in decentralized POMDPs
 In Proc. of the Eighth Int. Joint Conf. on Autonomous Agents and Multiagent Systems
, 2009
"... Decentralized partially observable Markov decision processes (DecPOMDPs) constitute a generic and expressive framework for multiagent planning under uncertainty. However, planning optimally is difficult because solutions map local observation histories to actions, and the number of such histories g ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) constitute a generic and expressive framework for multiagent planning under uncertainty. However, planning optimally is difficult because solutions map local observation histories to actions, and the number of such histories grows exponentially in the planning horizon. In this work, we identify a criterion that allows for lossless clustering of observation histories: i.e., we prove that when two histories satisfy the criterion, they have the same optimal value and thus can be treated as one. We show how this result can be exploited in optimal policy search and demonstrate empirically that it can provide a speedup of multiple orders of magnitude, allowing the optimal solution of significantly larger problems. We also perform an empirical analysis of the generality of our clustering method, which suggests that it may also be useful in other (approximate) DecPOMDP solution methods.
Decentralized planning under uncertainty for teams of communicating agents
 In Proc. AAMAS
, 2006
"... Decentralized partially observable Markov decision processes (DECPOMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and partially observable environment. Unfortunately, computing optimal plans in a DECPOMDP has been shown to be intractable (NEX ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
Decentralized partially observable Markov decision processes (DECPOMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and partially observable environment. Unfortunately, computing optimal plans in a DECPOMDP has been shown to be intractable (NEXPcomplete), and approximate algorithms for specific subclasses have been proposed. Many of these algorithms rely on an (approximate) solution of the centralized planning problem (i.e., treating the whole team as a single agent). We take a more decentralized approach, in which each agent only reasons over its own local state and some uncontrollable state features, which are shared by all team members. In contrast to other approaches, we model communication as an integral part of the agent’s reasoning, in which the meaning of a message is directly encoded in the policy of the communicating agent. We explore iterative methods for approximately solving such models, and we conclude with some encouraging preliminary experimental results.