Results 1 - 10
of
176
Efficient Solution Algorithms for Factored MDPs
, 2003
"... This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the re ..."
Abstract
-
Cited by 172 (3 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.
Reinforcement learning for RoboCup-soccer keepaway
- Adaptive Behavior
, 2005
"... 1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, in-cluding a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our appli-cation of episodic SMD ..."
Abstract
-
Cited by 134 (36 self)
- Add to MetaCart
(Show Context)
1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, in-cluding a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our appli-cation of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, “the keepers, ” tries to keep control of the ball for as long as possible despite the efforts of “the takers. ” The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.
Coordinated Reinforcement Learning
- In Proceedings of the ICML-2002 The Nineteenth International Conference on Machine Learning
, 2002
"... We present several new algorithms for multiagent reinforcement learning. A common feature of these algorithms is a parameterized, structured representation of a policy or value function. This structure is leveraged in an approach we call coordinated reinforcement learning, by which agents coordinate ..."
Abstract
-
Cited by 113 (6 self)
- Add to MetaCart
(Show Context)
We present several new algorithms for multiagent reinforcement learning. A common feature of these algorithms is a parameterized, structured representation of a policy or value function. This structure is leveraged in an approach we call coordinated reinforcement learning, by which agents coordinate both their action selection activities and their parameter updates. Within the limits of our parametric representations, the agents will determine a jointly optimal action without explicitly considering every possible action in their exponentially large joint action space. Our methods differ from many previous reinforcement learning approaches to multiagent coordination in that structured communication and coordination between agents appears at the core of both the learning algorithm and the execution architecture. Our experimental results, comparing our approach to other RL methods, illustrate both the quality of the policies obtained and the additional benefits of coordination. 1.
Generalizing plans to new environments in relational MDPs
- In International Joint Conference on Artificial Intelligence (IJCAI-03
, 2003
"... A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for dire ..."
Abstract
-
Cited by 113 (2 self)
- Add to MetaCart
(Show Context)
A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for direct planning. In this paper, we present an approach to the generalization problem based on a new framework of relational Markov Decision Processes (RMDPs). An RMDP can model a set of similar environments by representing objects as instances of different classes. In order to generalize plans to multiple environments, we define an approximate value function specified in terms of classes of objects and, in a multiagent setting, by classes of agents. This class-based approximate value function is optimized relative to a sampled subset of environments, and computed using an efficient linear programming method. We prove that a polynomial number of sampled environments suffices to achieve performance close to the performance achievable when optimizing over the entire space. Our experimental results show that our method generalizes plans successfully to new, significantly larger, environments, with minimal loss of performance relative to environment-specific planning. We demonstrate our approach on a real strategic computer war game. 1
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract
-
Cited by 91 (6 self)
- Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finite-horizon discrete POMDP is PSPACE-complete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
Decentralized control of cooperative systems: Categorization and complexity analysis
- Journal of Artificial Intelligence Research
, 2004
"... Decentralized control of cooperative systems captures the operation of a group of decision-makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general pr ..."
Abstract
-
Cited by 89 (9 self)
- Add to MetaCart
(Show Context)
Decentralized control of cooperative systems captures the operation of a group of decision-makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems. 1.
A robust architecture for distributed inference in sensor networks
, 2005
"... Abstract — Many inference problems that arise in sensor networks require the computation of a global conclusion that is consistent with local information known to each node. A large class of these problems— including probabilistic inference, regression, and control problems—can be solved by message ..."
Abstract
-
Cited by 73 (2 self)
- Add to MetaCart
Abstract — Many inference problems that arise in sensor networks require the computation of a global conclusion that is consistent with local information known to each node. A large class of these problems— including probabilistic inference, regression, and control problems—can be solved by message passing on a data structure called a junction tree. In this paper, we present a distributed architecture for solving these problems that is robust to unreliable communication and node failures. In this architecture, the nodes of the sensor network assemble themselves into a junction tree and exchange messages between neighbors to solve the inference problem efficiently and exactly. A key part of the architecture is an efficient distributed algorithm for optimizing the choice of junction tree to minimize the communication and computation required by inference. We present experimental results from a prototype implementation on a 97-node Mica2 mote network, as well as simulation results for three applications: distributed sensor calibration, optimal control, and sensor field modeling. These experiments demonstrate that our distributed architecture can solve many important inference problems exactly, efficiently, and robustly. I.
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the single-state case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decision-making analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decision-making tasks. We introduce different model-free reinforcementlearning techniques, unitedly called Sparse Cooperative Q-learning, which approximate the global action-value function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edge-based decomposition of the action-value function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcement-learning methods based on temporal differences.