• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Proto-value functions: A laplacian framework for learning representation and control in markov decision processes (2006)

Cached

  • Download as a PDF

Download Links

  • [jmlr.csail.mit.edu]
  • [jmlr.org]
  • [homes.cs.washington.edu]
  • [homes.cs.washington.edu]
  • [www-anw.cs.umass.edu]
  • [www.cs.umass.edu]
  • [people.cs.umass.edu]
  • [www-all.cs.umass.edu]
  • [www.cs.umass.edu]
  • [www.math.duke.edu]
  • [www-anw.cs.umass.edu]
  • [people.cs.umass.edu]
  • [www-all.cs.umass.edu]
  • [www.math.duke.edu]
  • [people.cs.umass.edu]
  • [www.cs.umass.edu]
  • [people.cs.umass.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Sridhar Mahadevan , Mauro Maggioni , Carlos Guestrin
Venue:Journal of Machine Learning Research
Citations:92 - 10 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Mahadevan06proto-valuefunctions:,
    author = {Sridhar Mahadevan and Mauro Maggioni and Carlos Guestrin},
    title = {Proto-value functions: A laplacian framework for learning representation and control in markov decision processes},
    journal = {Journal of Machine Learning Research},
    year = {2006},
    volume = {8},
    pages = {2007}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called proto-value functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A three-phased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using least-squares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for out-of-sample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.

Keyphrases

markov decision process    proto-value function    laplacian framework    basis function    specific instantiation    optimal policy    major component    out-of-sample interpolation    rpi framework    sample collection phase    factored mdps    symmetric diffusion operator    kronecker sum factorization    parameter estimation method    representation policy iteration comprising    global basis function    undirected graph    paper include    least-squares policy iteration    product space    three-phased procedure    continuous state space    illustrative discrete    final parameter estimation phase    continuous control task    several strategy    novel spectral framework    graph laplacian    large mdps    general scheme    several elaboration    state transition    nystr extension    many challenge    compact eigenfunctions   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University