(Enter summary)
Abstract: The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal difference methods such as value iteration. Experiments in this area have produced mixed results; there have been both notable successes and notable disappointments. Theory has been scarce, mostly due to the difficulty of reasoning about function approximators that generalize beyond the observed data. We provide a proof of convergence for a wide class of temporal... (Update)
Cited by: More
Solving Factored MDPs via Non-Homogeneous Partitioning - Kee-Eung Kim And
(Correct)
Online Fitted Reinforcement Learning - Geoffrey Gordon Computer (1995)
(Correct)
Intensive Reinforcement Learning - Wawrzyski (2005)
(Correct)
Similar documents (at the sentence level):
22.2%: Approximate Solutions to Markov Decision Processes - Gordon (1999)
(Correct)
Active bibliography (related documents): More All
0.3: Truncated Temporal Differences with Function Approximation.. - Cichosz (1996)
(Correct)
0.3: Task Level Strategies for Robots - Narasimhan (1994)
(Correct)
0.3: Generalization in Reinforcement Learning: Safely Approximating .. - Boyan, Moore (1995)
(Correct)
Similar documents based on text: More All
0.2: Wavelet Neural Networks Are Asymptotically Optimal .. - Kreinovich.. (1992)
(Correct)
0.2: Fuzzy Systems With Defuzzification Are Universal Approximators - Castro, Delgado (1996)
(Correct)
0.1: Stable Fitted Reinforcement Learning - Geoffrey Gordon (1996)
(Correct)
Related documents from co-citation: More All
30: Generalization in reinforcement learning: safely approximating the value functio..
- Boyan, Moore - 1995
29: Residual algorithms : Reinforcement learning with function approximation
- Baird - 1995
21: Feature-Based Methods for Large-Scale Dynamic Programming
- Tsitsiklis, Van Roy - 1994
BibTeX entry: (Update)
G. J. Gordon. Stable function approximation in dynamic programming. In Machine Learning (proceedings of the twelfth international conference), San Francisco, CA, 1995. Morgan Kaufmann. http://citeseer.ist.psu.edu/gordon95stable.html More
@inproceedings{ gordon95stable,
author = "Geoffrey J. Gordon",
title = "Stable function approximation in dynamic programming",
booktitle = "Proceedings of the Twelfth International Conference on Machine Learning",
publisher = "Morgan Kaufmann",
address = "San Francisco, CA",
editor = "Armand Prieditis and Stuart Russell",
pages = "261--268",
year = "1995",
url = "citeseer.ist.psu.edu/gordon95stable.html" }
Citations (may not include all citations):
658
Learning from Delayed Rewards (context) - Watkins - 1989
563
Learning to predict by the methods of temporal differences
- Sutton - 1988
348
Parallel and Distributed Computation: Numerical Methods (context) - Bertsekas, Tsitsiklis - 1989
303
Princeton University Press (context) - Ford, Fulkerson et al. - 1962
281
Machine Learning (context) - Watkins, Dayan - 1992
281
Machine Learning (context) - Dayan, of et al. - 1992
115
The parti-game algorithm for variable resolution reinforceme..
- Moore - 1994
107
the convergence of stochastic iterative dynamic programming ..
- Jaakkola, Jordan et al. - 1994
102
Generalization in reinforcement learning: safely approximati..
- Boyan, Moore - 1995
101
Adaptive Control Processes (context) - Bellman - 1961
71
Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1994
66
Stable function approximation in dynamic programming
- Gordon - 1995
61
Quarterly of Applied Mathematics (context) - Bellman, problem - 1958
52
Variable resolution dynamic programming: efficiently learnin.. (context) - Moore - 1991
21
Polynomial approximation --- a new computational technique i.. (context) - Bellman, Kalaba et al. - 1963
19
An adaptive optimal controller for discrete-time Markov envi.. (context) - Witten - 1977
16
Discounted dynamic programming (context) - Blackwell - 1965
11
Neurogammon: a neural network backgammon program (context) - Tesauro - 1990
4
values with basis function representations (context) - Sabes - 1993
3
Technical note: an upper bound on the loss from approximate .. (context) - Singh, Yee - 1994
3
An optimal multigrid algorithm for discrete-time stochastic .. (context) - Chow, Tsitsiklis - 1989
2
Mathematical Tables and Aids to Computation (context) - Bellman, Dreyfus et al. - 1959
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.cmu.edu/~ggordon/): More
Chattering in SARSA(lambda) - A CMU Learning Lab Internal Report - Gordon (1996)
(Correct)
Online Fitted Reinforcement Learning - Geoffrey Gordon (1995)
(Correct)
Stable Fitted Reinforcement Learning - Geoffrey Gordon (1996)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC