See this document in CiteSeerX!

An Analysis of Temporal-Difference Learning with Function Approximation (1996)  (Make Corrections)  (63 citations)
John N. Tsitsiklis, Benjamin Van Roy



  Home/Search   Context   Related

 
View or download:
mit.edu/people/jnt...97bvrtdpre.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  mit.edu/people/jnt/publ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Wediscuss thetemporal-difference learning algorithm, as applied toapproximatingthe cost-to-go function of an infinite-horizon discounted Markovchain. The algorithm weanalyze updates parametersofalinear function approximator on--line, duringasingle endless trajectory of an irreducible aperiodic Markovchain withafiniteorinfinitestate space. Wepresent a proof of convergence (with probability1),acharacterization of the limit of convergence, andaboundontheresultingapproximation error. Furthermore,... (Update)

Cited by:   More
Least-Squares Temporal Difference Learning - Justin Boyan Nasa   (Correct)
Planning In Hybrid Structured Stochastic - Domains Comenius University   (Correct)
A Convergent Form of Approximate Policy - Iteration Theodore Perkins   (Correct)

Similar documents (at the sentence level):
48.7%:   An Analysis of Temporal-Difference Learning with Function.. - Tsitsiklis, Van Roy (1996)   (Correct)
7.0%:   Learning and Value Function Approximation in Complex Decision.. - Van Roy (1998)   (Correct)

Similar documents based on text:   More   All
0.5:   Lids-P-2390 - May Average Cost   (Correct)
0.4:   March2,1999 - On Average Versus   (Correct)
0.4:   On Step-Size and Bias in Temporal-Difference Learning - Richard Sutton (1994)   (Correct)

Related documents from co-citation:   More   All
33:   Learning to predict by the method of temporal differences - Sutton - 1988
29:   Neuro-Dynamic Programming (context) - Bertsekas, Tsitsiklis - 1996
18:   Reinforcement learning: An introduction - Sutton, Barto - 1998

BibTeX entry:   (Update)

J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. Technical Report LIDS-P-2322, MIT, 1996. http://citeseer.ist.psu.edu/tsitsiklis96analysis.html   More

@techreport{ tsitsiklis96analysis,
    author = "John N. Tsitsiklis and Benjamin Van Roy",
    title = "An analysis of temporal-difference learning with function approximation",
    number = "LIDS-P-2322",
    year = "1996",
    url = "citeseer.ist.psu.edu/tsitsiklis96analysis.html" }
Citations not processed or no citations identified.



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://anxiety-closet.mit.edu/people/jnt/publ.html):   More
Boundedness of Finitely Generated Matrix Semigroups is.. - Blondel, Tsitsiklis (1999)   (Correct)
A Neuro-Dynamic Programming Approach to Call Admission.. - Marbach, Tsitsiklis (1997)   (Correct)
Deciding Stability and Mortality of Piecewise.. - Blondel, Bournez, .. (1998)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC