MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Analytical mean squared error curves for temporal difference learning (1998) [19 citations — 6 self]

Download:
Download as a PDF | Download as a PS
by Peter Dayan, G. Barto
Machine Learning
ftp://ftp.cs.colorado.edu/users/baveja/Papers/MLjournal4.ps.gz
Add To MetaCart

Abstract:

Abstract. We provide analytical expressions governing changes to the bias and variance of the lookup table estimators provided by various Monte Carlo and temporal di#erence value estimation algorithms with o#ine updates over trials in absorbing Markov reward processes. We have used these expressions to develop software that serves as an analysis tool: given a complete description of a Markov reward process, it rapidly yields an exact mean-square-error curve, the curve one would get from averaging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to illustrate classes of mean-squareerror curve behavior in a variety of example reward processes, and we show that although the various temporal di#erence algorithms are quite sensitive to the choice of step-size and eligibilitytrace parameters, there are values of these parameters that make them similarly competent, and generally good.

Citations

938 Learning from Delayed Rewards – Watkins - 1989
885 Learning to Predict by the Methods of Temporal Differences – Sutton - 1988
162 On the convergence of stochastic iterative dynamic programming algorithms – Jaakkola, Jordan, et al. - 1994
147 Learning to predict by the methods of temporal di erences – Sutton - 1988
136 Reinforcement learning with replacing eligibility traces – Singh, Sutton - 1996
135 Neuronlike elements that can solve difficult learning control problems – Barto, Sutton, et al. - 1983
105 Asynchronous stochastic approximation and Q-learning – Tsitsiklis - 1994
64 The convergence of TD(*) for general – Dayan - 1992
56 Rigorous learning curve bounds from statistical mechanics – Haussler, Kearns, et al. - 1996
41 Td(lambda) converges with probability 1 – Dayan, Sejnowski - 1994
27 Neuronlike elements that can solve di cult learning control problems – Barto, Sutton, et al. - 1983
19 The convergence of td(λ) for general λ – Dayan - 1992
12 Monte carlo matrix inversion and reinforcement learning – Barto, Duff - 1994
8 Temporal-difference methods and Markov models – Barnard - 1993
8 A note on the inversion of matrices by random walks – Wasow - 1952
2 Learning curve bounds for markov decision processes with undiscounted rewards – Saul, Singh - 1996