(Enter summary)
Abstract: Recent developments in the area of reinforcement learning have
yielded a number of new algorithms for the prediction and control of
Markovian environments. These algorithms, including the TD() algorithm
of Sutton (1988) and the Q-learning algorithm of Watkins (1989),
can be motivated heuristically as approximations to dynamic programming
(DP). In this paper we provide a rigorous proof of convergence of
these DP-based learning algorithms by relating them to the powerful
techniques of stochastic... (Update)
Cited by: More
Imitative Reinforcement Learning for Soccer Playing Robots - Latzke, Behnke, Bennewitz
(Correct)
Reinforcement Learning Algorithm for - Partially Observable Markov
(Correct)
Online Fitted Reinforcement Learning - Geoffrey Gordon Computer (1995)
(Correct)
Similar documents (at the sentence level):
71.4%: On the Convergence of Stochastic Iterative - Dynamic Programming Algorithms
(Correct)
50.4%: On the Convergence of Stochastic Iterative Dynamic.. - Jaakkola, Jordan (1994)
(Correct)
12.5%: Convergence of Stochastic Iterative Dynamic Programming.. - Jaakkola, Jordan, Singh (1994)
(Correct)
Active bibliography (related documents): More All
0.1: An Analysis of non-Markov Automata Games: Implications for.. - Pendrith, McGarity (1997)
(Correct)
0.1: Incremental Multi-Step Q-Learning - Peng, Williams (1996)
(Correct)
0.1: Residual Q-Learning Applied to Visual Attention - Bandera, al.
(Correct)
Similar documents based on text: More All
0.1: Convergence Results for Single-Step On-Policy.. - Singh, Jaakkola, al. (1998)
(Correct)
0.1: Reinforcement Learning Algorithm for Partially Observable.. - Tommi Jaakkola (1995)
(Correct)
0.1: Learning To Solve Markovian Decision Processes - Singh (1994)
(Correct)
Related documents from co-citation: More All
57: Learning from Delayed Rewards (context) - CJCH - 1989
41: Machine Learning (context) - Watkins, Dayan - 1992
39: Learning to predict by the method of temporal differences
- Sutton - 1988
BibTeX entry: (Update)
T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185-- 1201, 1994. http://citeseer.ist.psu.edu/article/jaakkola93convergence.html More
@inproceedings{ jaakkola94convergence,
author = "Tommi Jaakkola and Michael I. Jordan and Satinder P. Singh",
title = "Convergence of Stochastic Iterative Dynamic Programming Algorithms",
booktitle = "Advances in Neural Information Processing Systems",
volume = "6",
publisher = "Morgan Kaufmann Publishers, Inc.",
editor = "Jack D. Cowan and Gerald Tesauro and Joshua Alspector",
pages = "703--710",
year = "1994",
url = "citeseer.ist.psu.edu/article/jaakkola93convergence.html" }
Citations (may not include all citations):
348
Parallel and Distributed Computation: Numerical Methods (context) - Bertsekas, Tsitsiklis - 1989
281
Machine Learning (context) - Watkins, Dayan - 1992
257
Learning to act using real-time dynamic programming
- Barto, Bradtke et al. - 1993
230
Dynamic Programming: Deterministic and Stochastic Models (context) - Bertsekas - 1987
101
Applied Probability Models with Optimization Applications (context) - Ross - 1970
48
Approximate dynamic programming for real-time control and ne.. (context) - Werbos - 1992
32
converges with probability
- Dayan, Sejnowski - 1993
25
The convergence of TD (context) - Dayan - 1992
23
Sequential decision problems and neural networks (context) - Barto, Sutton et al. - 1990
9
A stochastic approximation model (context) - Robbins, Monro - 1951
7
Optimization of Stochastic Systems (context) - Aoki - 1967
6
Learning from delayed rewards (context) - Learning, Watkins - 1989
2
Learning to predict by the methods of temporal differences (context) - Francisco, Sutton - 1988
2
On stochastic approximation (context) - The, San et al. - 1956
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://signal.kuamp.kyoto-u.ac.jp/~kazushi/techrep.htm): More
Of Expressions; It Means That the Convolution Product - May Be
(Correct)
Synchronization and Desynchronization in a Network of Locally .. - Campbell, Wang (1996)
(Correct)
Learning and Generalization Theories of Large.. - Monasson, Zecchina (1996)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC