MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Integrated Learning and Planning Based on Truncating Temporal Differences

Download:
Download as a PDF | Download as a PS
unknown authors
ftp://tichy.ise.pw.edu.pl/pub/pawel/ttd-plan.ps.gz
Add To MetaCart

Abstract:

Abstract. Reinforcement learning systems learn to act in an uncertain environment by executing actions and observing their long-term effects. A large number of time steps may be required before this trial-and-error process converges to a satisfactory policy. It is highly desirable that the number of experiences needed by the system to learn to perform its task be minimized, particularly if making errors costs much. One approach to achieve this goal is to use hypothetical experiences, which requires some additional computation, but may reduce the necessary number of much more costly real experiences. This well-known idea of augmenting reinforcement learning by planning is revisited in this paper in the context of truncated TD(), or TTD, a simple computational technique which allows reinforcement learning algorithms based on the methods of temporal differences to learn considerably faster with essentially no additional computational expense. Two different ways of combining TTD with planning are proposed which make it possible to benefit from ? 0 in both the learning and planning processes. The algorithms are evaluated experimentally on a family of grid path-finding tasks and shown to indeed yield a considerable reduction of the number of real interactions with the environment necessary to converge, as well as an improvement of scaling properties. 1

Citations

938 Learning from Delayed Rewards – Watkins - 1989
885 Learning to Predict by the Methods of Temporal Differences – Sutton - 1988
377 Neuronlike adaptive elements that can solve difficult learning control problems – Barto, Sutton, et al. - 1983
374 Integrated Architecture for Learning, Planning, and Reacting Based on Approximating Dynamic Programming – Sutton - 1990
306 Automatic programming of behavior-based robots using reinforcement learning – Mahadevan, Connell - 1991
239 Prioritized sweeping: Reinforcement learning with less data and less real time – Moore, Atkeson - 1993
239 Generalization in reinforcement learning: Successful examples using sparse tile coding – Sutton - 1996
212 Temporal Credit Assignment in Reinforcement Learning – Sutton - 1984
174 Reinforcement learning for robots using neural networks – Lin - 1993
80 Efficient Learning and Planning Within the Dyna Framework – Peng, Williams - 1993
22 Truncating temporal differences: On the efficient implementation of TD() for reinforcement learning – Cichosz - 1995
7 Fast and efficient reinforcement learning with truncated temporal differences – Mulawka - 1995
3 Integrated architectures for learning, planning, and reacting based on approximating TD(lambda), EPrint at http://citeseer.nj.nec.com/392847.html – Cichosz, Mulawka - 1995
1 Truncated temporal differences with function approximation: Successful examples using CMAC – Cichosz - 1996