See this document in CiteSeerX!

Stable Function Approximation in Dynamic Programming (1995)  (Make Corrections)  (66 citations)
Geoffrey J. Gordon
Proceedings of the Twelfth International Conference on Machine Learning



  Home/Search   Context   Related

 
View or download:
cmu.edu/user/ggord...l95stabledp.ps.Z
cmu.edu/pub_files/...ffrey_1995_2.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  cmu.edu/~ggordon/ (more)
Homepages:  G.Gordon  

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal difference methods such as value iteration. Experiments in this area have produced mixed results; there have been both notable successes and notable disappointments. Theory has been scarce, mostly due to the difficulty of reasoning about function approximators that generalize beyond the observed data. We provide a proof of convergence for a wide class of temporal... (Update)

Cited by:   More
Solving Factored MDPs via Non-Homogeneous Partitioning - Kee-Eung Kim And   (Correct)
Online Fitted Reinforcement Learning - Geoffrey Gordon Computer (1995)   (Correct)
Intensive Reinforcement Learning - Wawrzyski (2005)   (Correct)

Similar documents (at the sentence level):
22.2%:   Approximate Solutions to Markov Decision Processes - Gordon (1999)   (Correct)

Active bibliography (related documents):   More   All
0.3:   Truncated Temporal Differences with Function Approximation.. - Cichosz (1996)   (Correct)
0.3:   Task Level Strategies for Robots - Narasimhan (1994)   (Correct)
0.3:   Generalization in Reinforcement Learning: Safely Approximating .. - Boyan, Moore (1995)   (Correct)

Similar documents based on text:   More   All
0.2:   Wavelet Neural Networks Are Asymptotically Optimal .. - Kreinovich.. (1992)   (Correct)
0.2:   Fuzzy Systems With Defuzzification Are Universal Approximators - Castro, Delgado (1996)   (Correct)
0.1:   Stable Fitted Reinforcement Learning - Geoffrey Gordon (1996)   (Correct)

Related documents from co-citation:   More   All
30:   Generalization in reinforcement learning: safely approximating the value functio.. - Boyan, Moore - 1995
29:   Residual algorithms : Reinforcement learning with function approximation - Baird - 1995
21:   Feature-Based Methods for Large-Scale Dynamic Programming - Tsitsiklis, Van Roy - 1994

BibTeX entry:   (Update)

G. J. Gordon. Stable function approximation in dynamic programming. In Machine Learning (proceedings of the twelfth international conference), San Francisco, CA, 1995. Morgan Kaufmann. http://citeseer.ist.psu.edu/gordon95stable.html   More

@inproceedings{ gordon95stable,
    author = "Geoffrey J. Gordon",
    title = "Stable function approximation in dynamic programming",
    booktitle = "Proceedings of the Twelfth International Conference on Machine Learning",
    publisher = "Morgan Kaufmann",
    address = "San Francisco, CA",
    editor = "Armand Prieditis and Stuart Russell",
    pages = "261--268",
    year = "1995",
    url = "citeseer.ist.psu.edu/gordon95stable.html" }
Citations (may not include all citations):
658   Learning from Delayed Rewards (context) - Watkins - 1989
563   Learning to predict by the methods of temporal differences - Sutton - 1988
348   Parallel and Distributed Computation: Numerical Methods (context) - Bertsekas, Tsitsiklis - 1989
303   Princeton University Press (context) - Ford, Fulkerson et al. - 1962
281   Machine Learning (context) - Watkins, Dayan - 1992
281   Machine Learning (context) - Dayan, of et al. - 1992
115   The parti-game algorithm for variable resolution reinforceme.. - Moore - 1994
107   the convergence of stochastic iterative dynamic programming .. - Jaakkola, Jordan et al. - 1994
102   Generalization in reinforcement learning: safely approximati.. - Boyan, Moore - 1995
101   Adaptive Control Processes (context) - Bellman - 1961
71   Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1994
66   Stable function approximation in dynamic programming - Gordon - 1995
61   Quarterly of Applied Mathematics (context) - Bellman, problem - 1958
52   Variable resolution dynamic programming: efficiently learnin.. (context) - Moore - 1991
21   Polynomial approximation --- a new computational technique i.. (context) - Bellman, Kalaba et al. - 1963
19   An adaptive optimal controller for discrete-time Markov envi.. (context) - Witten - 1977
16   Discounted dynamic programming (context) - Blackwell - 1965
11   Neurogammon: a neural network backgammon program (context) - Tesauro - 1990
4   values with basis function representations (context) - Sabes - 1993
3   Technical note: an upper bound on the loss from approximate .. (context) - Singh, Yee - 1994
3   An optimal multigrid algorithm for discrete-time stochastic .. (context) - Chow, Tsitsiklis - 1989
2   Mathematical Tables and Aids to Computation (context) - Bellman, Dreyfus et al. - 1959



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.cmu.edu/~ggordon/):   More
Chattering in SARSA(lambda) - A CMU Learning Lab Internal Report - Gordon (1996)   (Correct)
Online Fitted Reinforcement Learning - Geoffrey Gordon (1995)   (Correct)
Stable Fitted Reinforcement Learning - Geoffrey Gordon (1996)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC