See this document in CiteSeerX!

On the Convergence of Stochastic Iterative  (Make Corrections)  
Dynamic Programming Algorithms Tommi Jaakkola Michael I. Jordan Department of ...



  Home/Search   Context   Related

 
View or download:
berkeley.edu/~jord...mputation94.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  berkeley.edu/~jord...publications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD() algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of... (Update)

Similar documents (at the sentence level):
55.6%:   On the Convergence of Stochastic Iterative Dynamic.. - Jaakkola, Jordan (1994)   (Correct)
12.2%:   Convergence of Stochastic Iterative Dynamic Programming.. - Jaakkola, Jordan, Singh (1994)   (Correct)

Active bibliography (related documents):   More   All
0.1:   A New Parameter Estimation Method for Gaussian Mixtures - Singer, Warmuth (1998)   (Correct)
0.1:   Update rules for parameter estimation in Bayesian networks - Bauer, Koller, Singer (1997)   (Correct)
0.1:   Learning To Solve Markovian Decision Processes - Singh (1994)   (Correct)

System load high. Please wait...
Timeout. Please try your query later.
Similar documents based on text:
0.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ algorithms-convergence,
  author = "Dynamic Programming Algorithms",
  title = "On the Convergence of Stochastic Iterative",
  url = "citeseer.ist.psu.edu/757046.html" }
Citations (may not include all citations):
658   Learning from delayed rewards (context) - Watkins - 1989
563   Learning to predict by the methods of temporal differences - Sutton - 1988
348   Parallel and Distributed Computation: Numerical Methods (context) - Bertsekas, Tsitsiklis - 1989
281   Machine Learning (context) - Watkins, Dayan - 1992
257   Learning to act using real-time dynamic programming - Barto, Bradtke et al. - 1993
230   Dynamic Programming: Deterministic and Stochastic Models (context) - Bertsekas - 1987
101   Applied Probability Models with Optimization Applications (context) - Ross - 1970
71   Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1993
48   Approximate dynamic programming for real-time control and ne.. (context) - Werbos - 1992
32   converges with probability - Dayan, Sejnowski - 1993
25   The convergence of TD (context) - Dayan - 1992
23   Sequential decision problems and neural networks (context) - Barto, Sutton et al. - 1990
12   On stochastic approximation (context) - Dvoretzky - 1956
9   A stochastic approximation model (context) - Robbins, Monro - 1951
7   Optimization of Stochastic Systems (context) - Aoki - 1967
4   Department of Computer Science preprint (context) - Peng, Williams - 1993

Documents on the same site (http://www.cs.berkeley.edu/~jordan/publications.html):   More
Improving the Mean Field Approximation via the Use of.. - Jaakkola, Jordan (1998)   (Correct)
Variational probabilistic inference and the QMR-DT database - Jaakkola, Jordan (1998)   (Correct)
Hidden Markov decision trees - Jordan, Ghahramani, Saul (1997)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC