MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  To appear in the Adaptive Behavior 6:2. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces

Download:
Download as a PDF | Download as a PS
by Juan C. Santamar'ia, Richard S. Sutton, Ashwin Ram
ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/SSR-98.ps.Z
Add To MetaCart

Abstract:

A key element in the solution of reinforcement learning problems is the value function. The purpose of this function is to measure the long-term utility or value of any given state. The function is important because an agent can use this measure to decide what to do next. A common problem in reinforcement learning when applied to systems having continuous states and action spaces is that the value function must operate with a domain consisting of real-valued variables, which means that it should be able to represent the value of infinitely many state and action pairs. For this reason, function approximators are used to represent the value function when a close-form solution of the optimal policy is not available. In this paper, we extend a previously proposed reinforcement learning algorithm so that it can be used with function approximators that generalize the value of individual experiences across both, state and action spaces. In particular, we discuss the benefits of using sparse coarse-coded function approximators to represent value functions and describe in detail three implementations: CMAC, instance-based, and case-based. Additionally, we discuss how function approximators having different degrees of resolution in different regions of the state and action spaces may influence the performance and learning efficiency of the agent. We propose a simple and modular technique that can be used to implement function approximators with non-uniform degrees of resolution so that it can represent the value function with higher accuracy in important regions of the state and action spaces. We performed extensive experiments in the double integrator and pendulum swing up systems to demonstrate the proposed ideas. Kewords: Reinforcement learning, function approximation, memory-based methods, continuous

Citations

1397 Dynamic Programming – Bellman - 1957
885 Learning to Predict by the Methods of Temporal Differences – Sutton - 1988
242 Self-improving reactive agents based on reinforcement learning, planning and teaching – Lin - 1992
239 Generalization in reinforcement learning: Successful examples using sparse tile coding – Sutton - 1996
202 A New Approach to Manipulator Control: the Cerebellar Model Articulation Controller (CMAC),” Trans – Albus - 1975
187 The Parti-game Algorithm for Variable Resolution Reinforcement Learning – Moore - 1993
156 On-line Q-learning using connectionist systems – Rummery, Niranjan - 1994
136 Reinforcement learning with replacing eligibility traces – Singh, Sutton - 1996
135 Neuronlike elements that can solve difficult learning control problems – Barto, Sutton, et al. - 1983
117 An analysis of temporal-difference learning with function approximation – Tsitsiklis
111 Stable Adaptive Systems – Narendra, Annaswamy - 1989
74 Incremental multi-step Q-learning – Peng, Williams - 1996
63 Optimal Control and Estimation – Stengel - 1994
43 Scaling reinforcement learning to robotics by exploiting the subsumption architecture – Mahadevan, Connell - 1991
40 Instance-based prediction of real-valued attributes – Kibler, Aha, et al. - 1989
37 Instance-based state identification for reinforcement learning – McCallum - 1995
35 Dynamic Programming and Optimal Control, Volume 1. Athena Scientific – Bertsekas - 1995
35 Continuous Case-Based Reasoning – Ram, Santamaría - 1997
29 Sparse distributed memory and related models. Associative Neural Memories – Kanerva - 1993
25 Reinforcement learning of multiple tasks using a hierarchical CMAC architecture – Tham - 1995
24 Memorybased learning for control – Moore, Atkeson, et al. - 1995
15 Digital and analog communication systems – Shanmugam - 1979
13 Efficient Dynamic Programming-Based Learning for Control – Peng - 1993
6 An Introduction to Dynamics and Control – Richards - 1979
3 Truncating temporal differences: on the efficient implementation of td() for reinforcement learning – Cishosz - 1996