Results 1 -
6 of
6
Reinforcement learning algorithms for MDPs
, 2009
"... This article presents a survey of reinforcement learning algorithms for Markov Decision Processes (MDP). In the first half of the article, the problem of value estimation is considered. Here we start by describing the idea of bootstrapping and temporal difference learning. Next, we compare increment ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This article presents a survey of reinforcement learning algorithms for Markov Decision Processes (MDP). In the first half of the article, the problem of value estimation is considered. Here we start by describing the idea of bootstrapping and temporal difference learning. Next, we compare incremental and batch algorithmic variants and discuss the impact of the choice of the function approximation method on the success of learning. In the second half, we describe methods that target the problem of learning to control an MDP. Here online and active learning are discussed first, followed by a description of direct and actor-critic methods.
Reverse iterative deepening for finite-horizon MDPs with large branching factors
- In ICAPS’12
, 2012
"... In contrast to previous competitions, where the problems were goal-based, the 2011 International Probabilistic Planning Competition (IPPC-2011) emphasized finite-horizon reward maximization problems with large branching factors. These MDPs modeled more realistic planning scenarios and presented chal ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In contrast to previous competitions, where the problems were goal-based, the 2011 International Probabilistic Planning Competition (IPPC-2011) emphasized finite-horizon reward maximization problems with large branching factors. These MDPs modeled more realistic planning scenarios and presented challenges to the previous state-of-the-art planners (e.g., those from IPPC-2008), which were primarily based on domain determinization — a technique more suited to goal-oriented MDPs with small branching factors. Moreover, large branching factors render the existing implementations of RTDP- and LAO ∗-style algorithms inefficient as well. In this paper we present GLUTTON, our planner at IPPC-2011 that performed well on these challenging MDPs. The main algorithm used by GLUTTON is LR 2 TDP, an LRTDPbased optimal algorithm for finite-horizon problems centered around the novel idea of reverse iterative deepening. We detail LR 2 TDP itself as well as a series of optimizations included in GLUTTON that help LR 2 TDP achieve competitive performance on difficult problems with large branching factors — subsampling the transition function, separating out natural dynamics, caching transition function samples, and others. Experiments show that GLUTTON and PROST, the IPPC-2011 winner, have complementary strengths, with GLUTTON demonstrating superior performance on problems with few high-reward terminal states.
Scaling Multiagent Markov Decision Processes (Extended Abstract)
"... Markov Decision Processes (MDPs) have proved to be useful and general models of optimal decision-making in uncertain domains. However, approaches to solving MDP’s using reinforcement learning that depend on storing the optimal value function and action models as tables do not scale to large state-sp ..."
Abstract
- Add to MetaCart
Markov Decision Processes (MDPs) have proved to be useful and general models of optimal decision-making in uncertain domains. However, approaches to solving MDP’s using reinforcement learning that depend on storing the optimal value function and action models as tables do not scale to large state-spaces. Three computational obstacles prevent the use of standard approaches when dealing with problems with many variables. First, the state space (and time required for convergence) grows exponentially in the number of variables. This makes computing the value function impractical or impossible in terms of both memory and time. Second, the space of possible actions is exponential in the number of agents, so even one-step look-ahead search is computationally expensive. Lastly, exact computation of the expected value of the next state is slow, as the number of possible future states is exponential in the number of variables. These three obstacles are referred to as
Algorithms, Experimentation
"... We consider the setting of multiple collaborative agents trying to complete a set of tasks as assigned by a centralized controller. We propose a scalable method called“Assignmentbased decomposition ” which is based on decomposing the problem of action selection into an upper assignment level and a l ..."
Abstract
- Add to MetaCart
We consider the setting of multiple collaborative agents trying to complete a set of tasks as assigned by a centralized controller. We propose a scalable method called“Assignmentbased decomposition ” which is based on decomposing the problem of action selection into an upper assignment level and a lower task execution level. The assignment problem is solved by search, while the task execution is solved through coordinated reinforcement learning. We show that this decomposition of the overall problem into two levels scales well and outperforms the state-of-the-art approaches including pure assignment-level search or pure coordinated reinforcement learning. We also show how this approach enables transfer learning from domains with few agents to domains with many agents.
Transfer Learning via Relational Templates
"... Abstract. Transfer Learning refers to learning of knowledge in one domain that can be applied to a different domain. In this paper, we view transfer learning as generalization of knowledge in a richer representation language that includes multiple subdomains as parts of the same superdomain. We empl ..."
Abstract
- Add to MetaCart
Abstract. Transfer Learning refers to learning of knowledge in one domain that can be applied to a different domain. In this paper, we view transfer learning as generalization of knowledge in a richer representation language that includes multiple subdomains as parts of the same superdomain. We employ relational templates of different specificity to learn pieces of additive value functions. We show significant transfer of learned knowledge across different subdomains of a real-time strategy game by generalizing the value function using relational templates.

