• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition (2000)

Cached

  • Download as a PDF

Download Links

  • [www.cs.cmu.edu]
  • [arxiv.org]
  • [arxiv.org]
  • [ftp.cs.orst.edu]
  • [ftp.cs.orst.edu]
  • [www.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www.cs.berkeley.edu]
  • [jair.org]
  • [www.jair.org]
  • [jmvidal.cse.sc.edu]
  • [www.eecs.berkeley.edu]
  • [engr.case.edu]
  • [gandalf.psych.umn.edu]
  • [www.cs.orst.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Thomas G. Dietterich
Venue:Journal of Artificial Intelligence Research
Citations:442 - 6 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Dietterich00hierarchicalreinforcement,
    author = {Thomas G. Dietterich},
    title = {Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition},
    journal = {Journal of Artificial Intelligence Research},
    year = {2000},
    volume = {13},
    pages = {227--303}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics---as a subroutine hierarchy---and a declarative semantics---as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consisten...

Keyphrases

maxq value function decomposition    hierarchical reinforcement learning    value function    hierarchical policy    target markov decision process    maxq decomposition    declarative semantics    reinforcement learning    subroutine hierarchy    target mdp    new approach    previous work    hierarchical reinforcement    maxq unifies    useful subgoals    additive combination    procedural semantics   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University