Download:
|
by Juan C. Santamar'ia, Richard S. Sutton, Ashwin Ram
ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/SSR-98.ps.Z
Add To MetaCart
Abstract:
A key element in the solution of reinforcement learning problems is the value function. The purpose of this function is to measure the long-term utility or value of any given state. The function is important because an agent can use this measure to decide what to do next. A common problem in reinforcement learning when applied to systems having continuous states and action spaces is that the value function must operate with a domain consisting of real-valued variables, which means that it should be able to represent the value of infinitely many state and action pairs. For this reason, function approximators are used to represent the value function when a close-form solution of the optimal policy is not available. In this paper, we extend a previously proposed reinforcement learning algorithm so that it can be used with function approximators that generalize the value of individual experiences across both, state and action spaces. In particular, we discuss the benefits of using sparse coarse-coded function approximators to represent value functions and describe in detail three implementations: CMAC, instance-based, and case-based. Additionally, we discuss how function approximators having different degrees of resolution in different regions of the state and action spaces may influence the performance and learning efficiency of the agent. We propose a simple and modular technique that can be used to implement function approximators with non-uniform degrees of resolution so that it can represent the value function with higher accuracy in important regions of the state and action spaces. We performed extensive experiments in the double integrator and pendulum swing up systems to demonstrate the proposed ideas. Kewords: Reinforcement learning, function approximation, memory-based methods, continuous
Citations
|
1397
|
Dynamic Programming
– Bellman
- 1957
|
|
885
|
Learning to Predict by the Methods of Temporal Differences
– Sutton
- 1988
|
|
242
|
Self-improving reactive agents based on reinforcement learning, planning and teaching
– Lin
- 1992
|
|
239
|
Generalization in reinforcement learning: Successful examples using sparse tile coding
– Sutton
- 1996
|
|
202
|
A New Approach to Manipulator Control: the Cerebellar Model Articulation Controller (CMAC),” Trans
– Albus
- 1975
|
|
187
|
The Parti-game Algorithm for Variable Resolution Reinforcement Learning
– Moore
- 1993
|
|
156
|
On-line Q-learning using connectionist systems
– Rummery, Niranjan
- 1994
|
|
136
|
Reinforcement learning with replacing eligibility traces
– Singh, Sutton
- 1996
|
|
135
|
Neuronlike elements that can solve difficult learning control problems
– Barto, Sutton, et al.
- 1983
|
|
117
|
An analysis of temporal-difference learning with function approximation
– Tsitsiklis
|
|
111
|
Stable Adaptive Systems
– Narendra, Annaswamy
- 1989
|
|
74
|
Incremental multi-step Q-learning
– Peng, Williams
- 1996
|
|
63
|
Optimal Control and Estimation
– Stengel
- 1994
|
|
43
|
Scaling reinforcement learning to robotics by exploiting the subsumption architecture
– Mahadevan, Connell
- 1991
|
|
40
|
Instance-based prediction of real-valued attributes
– Kibler, Aha, et al.
- 1989
|
|
37
|
Instance-based state identification for reinforcement learning
– McCallum
- 1995
|
|
35
|
Dynamic Programming and Optimal Control, Volume 1. Athena Scientific
– Bertsekas
- 1995
|
|
35
|
Continuous Case-Based Reasoning
– Ram, Santamaría
- 1997
|
|
29
|
Sparse distributed memory and related models. Associative Neural Memories
– Kanerva
- 1993
|
|
25
|
Reinforcement learning of multiple tasks using a hierarchical CMAC architecture
– Tham
- 1995
|
|
24
|
Memorybased learning for control
– Moore, Atkeson, et al.
- 1995
|
|
15
|
Digital and analog communication systems
– Shanmugam
- 1979
|
|
13
|
Efficient Dynamic Programming-Based Learning for Control
– Peng
- 1993
|
|
6
|
An Introduction to Dynamics and Control
– Richards
- 1979
|
|
3
|
Truncating temporal differences: on the efficient implementation of td() for reinforcement learning
– Cishosz
- 1996
|