(Enter summary)
Abstract: Wediscuss thetemporal-difference learning algorithm, as applied toapproximatingthe
cost-to-go function of an infinite-horizon discounted Markovchain. The algorithm weanalyze
updates parametersofalinear function approximator on--line, duringasingle endless trajectory
of an irreducible aperiodic Markovchain withafiniteorinfinitestate space. Wepresent
a proof of convergence (with probability1),acharacterization of the limit of convergence,
andaboundontheresultingapproximation error. Furthermore,... (Update)
Cited by: More
Least-Squares Temporal Difference Learning - Justin Boyan Nasa
(Correct)
Planning In Hybrid Structured Stochastic - Domains Comenius University
(Correct)
A Convergent Form of Approximate Policy - Iteration Theodore Perkins
(Correct)
Similar documents (at the sentence level):
48.7%: An Analysis of Temporal-Difference Learning with Function.. - Tsitsiklis, Van Roy (1996)
(Correct)
7.0%: Learning and Value Function Approximation in Complex Decision.. - Van Roy (1998)
(Correct)
Similar documents based on text: More All
0.5: Lids-P-2390 - May Average Cost
(Correct)
0.4: March2,1999 - On Average Versus
(Correct)
0.4: On Step-Size and Bias in Temporal-Difference Learning - Richard Sutton (1994)
(Correct)
Related documents from co-citation: More All
33: Learning to predict by the method of temporal differences
- Sutton - 1988
29: Neuro-Dynamic Programming (context) - Bertsekas, Tsitsiklis - 1996
18: Reinforcement learning: An introduction
- Sutton, Barto - 1998
BibTeX entry: (Update)
J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. Technical Report LIDS-P-2322, MIT, 1996. http://citeseer.ist.psu.edu/tsitsiklis96analysis.html More
@techreport{ tsitsiklis96analysis,
author = "John N. Tsitsiklis and Benjamin Van Roy",
title = "An analysis of temporal-difference learning with function approximation",
number = "LIDS-P-2322",
year = "1996",
url = "citeseer.ist.psu.edu/tsitsiklis96analysis.html" }
Citations not processed or no citations identified.
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://anxiety-closet.mit.edu/people/jnt/publ.html): More
Boundedness of Finitely Generated Matrix Semigroups is.. - Blondel, Tsitsiklis (1999)
(Correct)
A Neuro-Dynamic Programming Approach to Call Admission.. - Marbach, Tsitsiklis (1997)
(Correct)
Deciding Stability and Mortality of Piecewise.. - Blondel, Bournez, .. (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC