Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition (2000)
Cached
Download Links
- [www.cs.cmu.edu]
- [ftp.cs.orst.edu]
- [ftp.cs.orst.edu]
- [www.cs.cmu.edu]
- [www-2.cs.cmu.edu]
- [www.cs.berkeley.edu]
- [jair.org]
- [www.cs.orst.edu]
- DBLP
Other Repositories/Bibliography
| Venue: | Journal of Artificial Intelligence Research |
| Citations: | 307 - 6 self |
BibTeX
@ARTICLE{Dietterich00hierarchicalreinforcement,
author = {Thomas G. Dietterich},
title = {Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition},
journal = {Journal of Artificial Intelligence Research},
year = {2000},
volume = {13},
pages = {227--303}
}
Years of Citing Articles
OpenURL
Abstract
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics---as a subroutine hierarchy---and a declarative semantics---as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consisten...







