See this document in CiteSeerX!

Reinforcement Learning Algorithm for  (Make Corrections)  
Partially Observable Markov Decision Problems Tommi Jaakkola...



  Home/Search   Context   Related

 
View or download:
berkeley.edu/~jordan/pa...Nips94b.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  berkeley.edu/~jord...publications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the... (Update)

Similar documents based on text:
0.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ markov-reinforcement,
  author = "Partially Observable Markov",
  title = "Reinforcement Learning Algorithm for",
  url = "citeseer.ist.psu.edu/757090.html" }
Citations (may not include all citations):
658   Learning from delayed rewards (context) - Watkins - 1989
563   Learning to predict by the methods of temporal differences - Sutton - 1988
281   Machine Learning (context) - Watkins, Dayan - 1992
230   Dynamic Programming: Deterministic and Stochastic Models (context) - Bertsekas - 1987
107   the convergence of stochastic iterative Dynamic Programming .. - Jaakkola, Jordan et al. - 1994
106   A survey of partially observable Markov decision processes (context) - Monahan - 1982
71   Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1994
56   Learning without state estimation in partially observable en.. - Singh, Jaakkola et al. - 1994
52   A reinforcement learning method for maximizing undiscounted .. (context) - Schwartz - 1993
25   The convergence of TD (context) - Dayan - 1992
10   Monte-Carlo matrix inversion and reinforcement learning - Barto, Duff - 1994

Documents on the same site (http://www.cs.berkeley.edu/~jordan/publications.html):   More
Improving the Mean Field Approximation via the Use of.. - Jaakkola, Jordan (1998)   (Correct)
Variational probabilistic inference and the QMR-DT database - Jaakkola, Jordan (1998)   (Correct)
Hidden Markov decision trees - Jordan, Ghahramani, Saul (1997)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC