See this document in CiteSeerX!

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms (1993)  (Make Corrections)  (107 citations)
Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh
Advances in Neural Information Processing Systems



  Home/Search   Context   Related

 
View or download:
signal.kuamp.kyotou...jaakkola93.ps.gz
jhu.edu/~sheppard/cs.6...paper13b.ps.gz
gmd.de/Learning/rl...a.convergence.ps.Z
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  signal.kuamp.kyotou.ac...techrep (more)
From:  fermivista.math.juss...ftp.gmd.de
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD() algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic... (Update)

Cited by:   More
Imitative Reinforcement Learning for Soccer Playing Robots - Latzke, Behnke, Bennewitz   (Correct)
Reinforcement Learning Algorithm for - Partially Observable Markov   (Correct)
Online Fitted Reinforcement Learning - Geoffrey Gordon Computer (1995)   (Correct)

Similar documents (at the sentence level):
71.4%:   On the Convergence of Stochastic Iterative - Dynamic Programming Algorithms   (Correct)
50.4%:   On the Convergence of Stochastic Iterative Dynamic.. - Jaakkola, Jordan (1994)   (Correct)
12.5%:   Convergence of Stochastic Iterative Dynamic Programming.. - Jaakkola, Jordan, Singh (1994)   (Correct)

Active bibliography (related documents):   More   All
0.1:   An Analysis of non-Markov Automata Games: Implications for.. - Pendrith, McGarity (1997)   (Correct)
0.1:   Incremental Multi-Step Q-Learning - Peng, Williams (1996)   (Correct)
0.1:   Residual Q-Learning Applied to Visual Attention - Bandera, al.   (Correct)

Similar documents based on text:   More   All
0.1:   Convergence Results for Single-Step On-Policy.. - Singh, Jaakkola, al. (1998)   (Correct)
0.1:   Reinforcement Learning Algorithm for Partially Observable.. - Tommi Jaakkola (1995)   (Correct)
0.1:   Learning To Solve Markovian Decision Processes - Singh (1994)   (Correct)

Related documents from co-citation:   More   All
57:   Learning from Delayed Rewards (context) - CJCH - 1989
41:   Machine Learning (context) - Watkins, Dayan - 1992
39:   Learning to predict by the method of temporal differences - Sutton - 1988

BibTeX entry:   (Update)

T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185-- 1201, 1994. http://citeseer.ist.psu.edu/article/jaakkola93convergence.html   More

@inproceedings{ jaakkola94convergence,
    author = "Tommi Jaakkola and Michael I. Jordan and Satinder P. Singh",
    title = "Convergence of Stochastic Iterative Dynamic Programming Algorithms",
    booktitle = "Advances in Neural Information Processing Systems",
    volume = "6",
    publisher = "Morgan Kaufmann Publishers, Inc.",
    editor = "Jack D. Cowan and Gerald Tesauro and Joshua Alspector",
    pages = "703--710",
    year = "1994",
    url = "citeseer.ist.psu.edu/article/jaakkola93convergence.html" }
Citations (may not include all citations):
348   Parallel and Distributed Computation: Numerical Methods (context) - Bertsekas, Tsitsiklis - 1989
281   Machine Learning (context) - Watkins, Dayan - 1992
257   Learning to act using real-time dynamic programming - Barto, Bradtke et al. - 1993
230   Dynamic Programming: Deterministic and Stochastic Models (context) - Bertsekas - 1987
101   Applied Probability Models with Optimization Applications (context) - Ross - 1970
48   Approximate dynamic programming for real-time control and ne.. (context) - Werbos - 1992
32   converges with probability - Dayan, Sejnowski - 1993
25   The convergence of TD (context) - Dayan - 1992
23   Sequential decision problems and neural networks (context) - Barto, Sutton et al. - 1990
9   A stochastic approximation model (context) - Robbins, Monro - 1951
7   Optimization of Stochastic Systems (context) - Aoki - 1967
6   Learning from delayed rewards (context) - Learning, Watkins - 1989
2   Learning to predict by the methods of temporal differences (context) - Francisco, Sutton - 1988
2   On stochastic approximation (context) - The, San et al. - 1956



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://signal.kuamp.kyoto-u.ac.jp/~kazushi/techrep.htm):   More
Of Expressions; It Means That the Convolution Product - May Be   (Correct)
Synchronization and Desynchronization in a Network of Locally .. - Campbell, Wang (1996)   (Correct)
Learning and Generalization Theories of Large.. - Monasson, Zecchina (1996)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC