(Enter summary)
Abstract: On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year's meeting a series of negative results in attempting to apply dynamic programming together with function approximation to simple... (Update)
Cited by: More
Planning In Hybrid Structured Stochastic - Domains Comenius University
(Correct)
Towards a Unified Theory of State Abstraction for MDPs - Li, Walsh, Littman (2006)
(Correct)
A Reinforcement Learning Algorithm based on Policy Iteration for.. - Gosavi (2004)
(Correct)
Similar documents (at the sentence level):
11.4%: Reinforcement Learning for 3 vs. 2 Keepaway - Stone, Sutton, Singh
(Correct)
Active bibliography (related documents): More All
0.2: Learning From State Differences: - Weaver, Baxter (1999)
(Correct)
0.2: STD(λ): learning state differences with TD(λ) - Weaver, Baxter
(Correct)
0.2: Minimum-Time Control of the Acrobot - Boone (1997)
(Correct)
Similar documents based on text: More All
0.1: Efficient Value Function Approximation For Reinforcement Learning - Wang (1998)
(Correct)
0.1: Model-Based Reinforcement Learning with an Approximate.. - Kuvayev, Sutton (1996)
(Correct)
0.1: An Analysis of Temporal-Difference Learning with Function.. - Tsitsiklis, Van Roy (1996)
(Correct)
Related documents from co-citation: More All
42: Learning from Delayed Rewards (context) - CJCH - 1989
33: Generalization in reinforcement learning: safely approximating the value functio..
- Boyan, Moore - 1995
33: Learning to predict by the method of temporal differences
- Sutton - 1988
BibTeX entry: (Update)
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8. MIT Press, 1996. http://citeseer.ist.psu.edu/sutton96generalization.html More
@inproceedings{ sutton96generalization,
author = "Richard S. Sutton",
title = "Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding",
booktitle = "Advances in Neural Information Processing Systems",
volume = "8",
publisher = "The {MIT} Press",
editor = "David S. Touretzky and Michael C. Mozer and Michael E. Hasselmo",
pages = "1038--1044",
year = "1996",
url = "citeseer.ist.psu.edu/sutton96generalization.html" }
Citations (may not include all citations):
563
Learning to predict by the methods of temporal differences
- Sutton - 1988
219
Practical issues in temporal difference learning
- Tesauro - 1992
141
Temporal Credit Assignment in Reinforcement Learning (context) - Sutton - 1984
135
Self-improving reactive agents based on reinforcement learni.. (context) - Lin - 1992
124
Improving elevator performance using reinforcement learning
- Crites, Barto - 1996
102
Generalization in reinforcement learning: Safely approximati..
- Boyan, Moore - 1995
84
Residual Algorithms: Reinforcement Learning with Function Ap..
- Baird - 1995
84
Neuronlike elements that can solve difficult learning contro.. (context) - Barto, Sutton et al. - 1983
80
A reinforcement learning approach to job-shop scheduling
- Zhang, Dietterich - 1995
66
Stable function approximation in dynamic programming
- Gordon - 1995
59
Feature-based methods for large-scale dynamic programming
- Tsitsiklis, Van Roy - 1994
59
learning using connectionist systems (context) - Rummery, Niranjan - 1994
33
and Robotics (context) - Albus - 1981
25
The convergence of TD (context) - Dayan - 1992
25
Online learning with random representations
- Sutton, Whitehead - 1993
13
A counterexample to temporal differences learning
- Bertsekas - 1995
12
CMAC-based adaptive critic self-learning control (context) - Lin - 1991
10
Reinforcement learning for planning and control
- Dean, Basye et al. - 1992
4
Swinging up the acrobot: An example of intelligent control (context) - DeJong, Spong - 1994
1
New York: Wiley (context) - Learning, Vidyasagar et al. - 1989
1
CMAC: An associative neural network alternative to backpropa.. (context) - Networks, Miller et al. - 1990
1
Reinforcement learning with replacing eligibility traces (context) - CUED, TR et al. - 1996
1
Learning from Delayed Rewards (context) - LIDS-P, Cambridge et al. - 1989
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://fermivista.math.jussieu.fr/ftp/ftp.cs.umass.edu.html): More
RED: Robust Earliest Deadline Scheduling - Buttazzo, Stankovic (1993)
(Correct)
Operating System Issues for Continuous Media - Schulzrinne (1996)
(Correct)
Sharpening Bounds On The Time Between Events In Maximally.. - Avrunin (1992)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC