Results 1  10
of
65
Efficient Solution Algorithms for Factored MDPs
, 2003
"... This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the re ..."
Abstract

Cited by 174 (4 self)
 Add to MetaCart
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and contextspecific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomialsized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on maxnorm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing stateoftheart approach, showing, in some problems, exponential gains in computation time.
Generalizing plans to new environments in relational MDPs
 In International Joint Conference on Artificial Intelligence (IJCAI03
, 2003
"... A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for dire ..."
Abstract

Cited by 111 (2 self)
 Add to MetaCart
(Show Context)
A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for direct planning. In this paper, we present an approach to the generalization problem based on a new framework of relational Markov Decision Processes (RMDPs). An RMDP can model a set of similar environments by representing objects as instances of different classes. In order to generalize plans to multiple environments, we define an approximate value function specified in terms of classes of objects and, in a multiagent setting, by classes of agents. This classbased approximate value function is optimized relative to a sampled subset of environments, and computed using an efficient linear programming method. We prove that a polynomial number of sampled environments suffices to achieve performance close to the performance achievable when optimizing over the entire space. Our experimental results show that our method generalizes plans successfully to new, significantly larger, environments, with minimal loss of performance relative to environmentspecific planning. We demonstrate our approach on a real strategic computer war game. 1
Solving transition independent decentralized Markov decision processes
 JAIR
, 2004
"... Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of thes ..."
Abstract

Cited by 107 (14 self)
 Add to MetaCart
Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents ’ transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a nontrivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms. 1.
Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs
, 2005
"... In many realworld multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the realworld uncertainty in multiagent domains, they fail to exploit such locality ..."
Abstract

Cited by 96 (20 self)
 Add to MetaCart
In many realworld multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the realworld uncertainty in multiagent domains, they fail to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction but fails to capture planning under uncertainty. This paper present a new model synthesized from distributed POMDPs and DCOPs, called Networked Distributed POMDPs (NDPOMDPs). Exploiting network structure enables us to present two novel algorithms for NDPOMDPs: a distributed policy generation algorithm that performs local search and a systematic policy search that is guaranteed to reach the global optimal.
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract

Cited by 91 (6 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finitehorizon discrete POMDP is PSPACEcomplete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
Role Allocation and Reallocation in Multiagent Teams: Towards A Practical Analysis
 In AAMAS
, 2003
"... Despite the success of the BDI approach to agent teamwork, initial role allocation (i.e. deciding which agents to allocate to key roles in the team) and role reallocation upon failure remain open challenges. What remain missing are analysis techniques to aid human developers in quantitatively compar ..."
Abstract

Cited by 55 (11 self)
 Add to MetaCart
(Show Context)
Despite the success of the BDI approach to agent teamwork, initial role allocation (i.e. deciding which agents to allocate to key roles in the team) and role reallocation upon failure remain open challenges. What remain missing are analysis techniques to aid human developers in quantitatively comparing different initial role allocations and competing role reallocation algorithms. To remedy this problem, this paper makes three key contributions. First, the paper introduces RMTDP (Rolebased Markov Team Decision Problem), an extension to MTDP [9], for quantitative evaluations of role allocation and reallocation approaches. Second, the paper illustrates an RMTDPbased methodology for not only comparing two competing algorithms for role reallocation, but also for identifying the types of domains where each algorithm is suboptimal, how much each algorithm can be improved and at what computational cost (complexity). Such algorithmic improvements are identified via a new automated procedure that generates a family of locally optimal policies for comparative evaluations. Third, since there are combinatorially many initial role allocations, evaluating each in RMTDP to identify the best is extremely difficult. Therefore, we introduce methods to exploit task decompositions among subteams to significantly prune the search space of initial role allocations. We present experimental results from two distinct domains.
Hybrid BDIPOMDP framework for multiagent teaming
 JAIR
"... Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDIPOMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Rolebased Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our
Noncommunicative multirobot coordination in dynamic environments
 Robotics and Autonomous Systems
, 2005
"... Within a group of cooperating agents the decision making of an individual agent depends on the actions of the other agents. In dynamic environments, these dependencies will change rapidly as a result of the continuously changing state. Via a contextspecific decomposition of the problem into smaller ..."
Abstract

Cited by 30 (10 self)
 Add to MetaCart
(Show Context)
Within a group of cooperating agents the decision making of an individual agent depends on the actions of the other agents. In dynamic environments, these dependencies will change rapidly as a result of the continuously changing state. Via a contextspecific decomposition of the problem into smaller subproblems, coordination graphs offer scalable solutions to the problem of multiagent decision making. In this work, we apply coordination graphs to a continuous (robotic) domain by assigning roles to the agents and then coordinating the different roles. Moreover, we demonstrate that, with some additional assumptions, an agent can predict the actions of the other agents, rendering communication superfluous. We have successfully implemented the proposed method into our UvA Trilearn simulated robot soccer team which won the RoboCup2003 World Championship in
Solving factored MDPs with hybrid state and action variables
 J. Artif. Intell. Res. (JAIR
"... Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model tha ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
(Show Context)
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scaleup potential on several hybrid optimization problems. 1.