Results 1 - 10
of
18
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract
-
Cited by 2829 (76 self)
- Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve long-term goals.
Learning Invariance From Transformation Sequences
, 1991
"... Introduction How can we consistently recognize objects when changes in the viewing angle, eye position, distance, size, orientation, relative position, or deformations of the object itself (e.g., of a newspaper or a gymnast) can change their retinal projections so significantly? The visual system m ..."
Abstract
-
Cited by 179 (2 self)
- Add to MetaCart
Introduction How can we consistently recognize objects when changes in the viewing angle, eye position, distance, size, orientation, relative position, or deformations of the object itself (e.g., of a newspaper or a gymnast) can change their retinal projections so significantly? The visual system must contain knowledge about such transformations in order to be able to generalize correctly. Part of this knowledge is probably determined genetically, but it is also likely that the visual system learns from its sensory experience, which contains plenty of examples of such transformations. Electrophysiological experiments suggest that the invariance properties of perception may be due to the receptive field characteristics of individual cells in the visual system. Complex cells in the primary visual cortex exhibit approximate invariance to position within a limited range (Hubel and Wiesel 1962), while cells in higher visual areas in the temporal cortex show more complex forms of invariance
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract
-
Cited by 119 (18 self)
- Add to MetaCart
A preliminary unedited version of this paper was incorrectly published as part of Volume
Adaptive Critics and the Basal Ganglia
- In
, 1995
"... One of the most active areas of research in artificial intelligence is the study of learning methods by which “embedded agents ” can improve performance while acting in complex dynamic environments. An agent, or decision maker, is embedded in an environment when it receives information from, and act ..."
Abstract
-
Cited by 53 (5 self)
- Add to MetaCart
One of the most active areas of research in artificial intelligence is the study of learning methods by which “embedded agents ” can improve performance while acting in complex dynamic environments. An agent, or decision maker, is embedded in an environment when it receives information from, and acts on, that environment in an ongoing closed-loop interaction. An embedded agent has to make decisions under time pressure and uncertainty and has to learn without the help of an ever-present knowledgeable teacher. Although the novelty of this emphasis may be inconspicuous to a biologist, animals being the prototypical embedded agents, this emphasis is a significant departure from the more traditional focus in artificial intelligence on reasoning within circumscribed domains removed from the flow of real-world events. One consequence of the embedded agent view is the increasing interest in the learning paradigm called reinforcement learning (RL). Unlike the more widely studied supervised learning systems, which learn from a set of examples of correct input/output behavior, RL systems adjust their behavior with the goal of maximizing the frequency and/or magnitude of the reinforcing events they encounter over time. While the core ideas of modern RL come from theories of animal classical and instrumental
Reinforcement Learning And Its Application To Control
, 1992
"... Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be us ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or model-free,...
Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning
- Journal of Artificial Intelligence Research
, 1995
"... Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor λ. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforceme ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor λ. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(λ) for arbitrary λ, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(λ), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using λ > 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.
A cerebellar model of timing and prediction in the control of reaching
- Neural Computation
, 1999
"... A simplified model of the cerebellum was developed to explore its potential for adaptive, predictive control based on delayed feedback information. An abstract representation of a single Purkinje cell with multistable properties was interfaced, via a formalized premotor network, with a simulated sin ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
A simplified model of the cerebellum was developed to explore its potential for adaptive, predictive control based on delayed feedback information. An abstract representation of a single Purkinje cell with multistable properties was interfaced, via a formalized premotor network, with a simulated single degree-of-freedom limb. The limb actuator was a nonlinear spring-mass system based on the nonlinear velocity dependence of the stretch reflex. By including realistic mossy fiber signals, as well as realistic conduction delays in afferent and efferent pathways, the model allowed the investigation of timing and predictive processes relevant to cerebellar involvement in the control of movement. The model regulates movement by learning to react in an anticipatory fashion to sensory feedback. Learning depends on training information generated from corrective movements and uses a temporally-asymmetric form of plasticity for the parallel fiber synapses on Purkinje cells. 1
A Neural Model Of Corticocerebellar Interactions During Attentive Imitation And Predictive Learning Of Sequential Handwriting Movements
, 2000
"... Much sensory-motor behavior develops through imitation, as during the learning of handwriting by children. Such complex sequential acts are broken down into distinct motor control synergies, or muscle groups, whose activities overlap in time to generate continuous, curved movements that obey an inve ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Much sensory-motor behavior develops through imitation, as during the learning of handwriting by children. Such complex sequential acts are broken down into distinct motor control synergies, or muscle groups, whose activities overlap in time to generate continuous, curved movements that obey an inverse relation between curvature and speed. How are such complex movements learned through attentive imitation? Novel movements may be made as a series of distinct segments, but a practiced movement can be made smoothly, with a continuous, often bellshaped, velocity profile. How does learning of complex movements transform reactive imitation into predictive, automatic performance? A neural model is developed which suggests how parietal and motor cortical mechanisms, such as difference vector encoding, interact with adaptively-timed, predictive cerebellar learning during movement imitation and predictive performance. To initiate movement, visual attention shifts along the shape to be imitated an...
A Predictive Switching Model of Cerebellar Movement Control
- ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 1995
"... We present a hypothesis about how the cerebellum could participate in regulating movement in the presence of significant feedback delays without resorting to a forward model of the motor plant. We show how a simplified cerebellar model can learn to control endpoint positioning of a nonlinear sprin ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
We present a hypothesis about how the cerebellum could participate in regulating movement in the presence of significant feedback delays without resorting to a forward model of the motor plant. We show how a simplified cerebellar model can learn to control endpoint positioning of a nonlinear spring--mass system with realistic delays in both afferent and efferent pathways. The model's operation involves prediction, but instead of predicting sensory input, it directly regulates movement by reacting in an anticipatory fashion to input patterns that include delayed sensory feedback.

