Results 1 - 10
of
43
Temporal sequence learning, prediction and control - a review of different models and their relation to biological mechanisms
- Neural Computation
, 2004
"... In this article we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spiketiming dependent plasticity. This review will briefly introduce the most influential models and focus on two questions: 1) T ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
(Show Context)
In this article we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spiketiming dependent plasticity. This review will briefly introduce the most influential models and focus on two questions: 1) To what degree are reward-based (e.g. TD-learning) and correlation based (hebbian) learning related? and 2) How do the different models correspond to possibly underlying biological mechanisms of synaptic plasticity? We will first compare the different models in an open-loop condition, where behavioral feedback does not alter the learning. Here we observe, that reward-based and correlation based learning are indeed very similar. Machine-control is then used to introduce the problem of closed-loop control (e.g. “actor-critic architectures”). Here the problem of evaluative (“rewards”) versus nonevaluative (“correlations”) feedback from the environment will be discussed showing that both learning approaches are fundamentally different in the closed-loop condition. In trying to answer the second question we will compare neuronal versions of the different learning architectures to the anatomy of the involved brain structures (basal-ganglia, thalamus and
Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity
- Neural Computation
, 2007
"... The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spiketiming-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
(Show Context)
The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spiketiming-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic Spike Response Model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrateand-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), while the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing rate pattern. These learning rules are biologically-plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated
Optimal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised Learning
, 2006
"... In timing-based neural codes, neurons have to emit action potentials at precise moments in time. We use a supervised learning paradigm to derive a synaptic update rule that optimizes by gradient ascent the likelihood of postsynaptic firing at one or several desired firing times. We find that the opt ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
In timing-based neural codes, neurons have to emit action potentials at precise moments in time. We use a supervised learning paradigm to derive a synaptic update rule that optimizes by gradient ascent the likelihood of postsynaptic firing at one or several desired firing times. We find that the optimal strategy of up- and downregulating synaptic efficacies depends on the relative timing between presynaptic spike arrival and desired postsynaptic firing. If the presynaptic spike arrives before the desired postsynaptic spike timing, our optimal learning rule predicts that the synapse should become potentiated. The dependence of the potentiation on spike timing directly reflects the time course of an excitatory postsynaptic potential. However, our approach gives no unique reason for synaptic depression under reversed spike timing. In fact, the presence and amplitude of depression of synaptic efficacies for reversed spike timing depend on how constraints are implemented in the optimization problem. Two different constraints, control of postsynaptic rates and control of temporal locality, are studied. The relation of our results to spike-timing-dependent plasticity and reinforcement learning is discussed.
Isotropic Sequence Order Learning
, 2003
"... In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according ..."
Abstract
-
Cited by 24 (15 self)
- Add to MetaCart
(Show Context)
In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according to the correlation of bandpass-filtered inputs with the derivative of the output. We investigate the algorithm in an open- and a closed-loop condition, the latter being defined by embedding the learning system into a behavioral feedback loop. In the open-loop condition, we find that the linear structure of the algorithm allows analytically calculating the shape of the weight change, which is strictly heterosynaptic and follows the shape of the weight change curves found in spike-time-dependent plasticity. Furthermore, we show that synaptic weights stabilize automatically when no more temporal differences exist between the inputs without additional normalizing measures. In the second part of this study, the algorithm is is placed in an environment that leads to closed sensormotor loop. To this end, a robot is programmed with a prewired retraction reflex reaction in response to collisions. Through isotropic sequence order (ISO) learning, the robot achieves collision avoidance by learning the correlation between his early range-finder signals and the later occurring collision signal. Synaptic weights stabilize at the end of learning as theoretically predicted. Finally, we discuss the relation of ISO learning with other drive reinforcement models and with the commonly used temporal difference learning algorithm. This study is followed up by a mathematical analysis of the closed-loop situation in the companion article in this issue, “ISO Learning Approximates a Solution to the Inverse-Controller Problem in an Unsupervised Behavioral Paradigm” (pp. 865–884).
Reinforcement Learning, Spike Time Dependent Plasticity and the BCM Rule
, 2006
"... Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plas-ticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We apply a r ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
(Show Context)
Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plas-ticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We apply a recently introduced policy learning algorithm from Machine Learning to networks of spiking neurons, and derive a spike time dependent plasticity rule which ensures convergence to a local optimum of the expected average reward. The approach is applicable to a broad class of neuronal models, including the Hodgkin-Huxley model. We demonstrate the effectiveness of the derived rule in several toy problems. Finally, through statistical analysis we show that the synaptic plasticity rule established is closely related to the widely used BCM rule, for which good biological evidence exists. 1 Policy Learning and Neuronal Dynamics Reinforcement Learning (RL) is a general term used for a class of learning problems in which an agent attempts to improve its performance over time at a given task (e.g., Bertsekas and Tsitsiklis, 1996; Sutton and Barto, 1998). Formally, it is the problem of mapping situations to actions in order to maximize a given reward signal. The interaction between the agent and the environment is modelled mathematically as a Partially Ob-
Spike timing-dependent plasticity: The relationship to rate-based learning for models with weight dynamics determined by a stable fixed-point
- Neural Computation
, 2004
"... to rate-based learning for models with weight dynamics determined by a stable fixed-point ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
to rate-based learning for models with weight dynamics determined by a stable fixed-point
Spike-Timing Dependent Plasticity for Evolved Robots
, 2003
"... Plastic spiking neural networks are synthesized for phototactic robots using evolutionary techniques. Synaptic plasticity asymmetrically depends on the precise relative timing between presynaptic and postsynaptic spikes at the millisecond range and on longer-term activity-dependent regulatory sca ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Plastic spiking neural networks are synthesized for phototactic robots using evolutionary techniques. Synaptic plasticity asymmetrically depends on the precise relative timing between presynaptic and postsynaptic spikes at the millisecond range and on longer-term activity-dependent regulatory scaling. Comparative studies have been carried out for dierent kinds of plastic neural networks with low and high level of neural noise. In all cases, the evolved controllers are highly robust against internal synaptic decay and other perturbations.
Reinforcement learning in continuous state and action space
, 2003
"... To solve complex navigation tasks, autonomous agents such as rats or mobile robots often employ spatial representations. These “maps” can be used for localisation and navigation. We propose a model for spatial learning and navigation based on reinforcement learning. The state space is represented b ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
To solve complex navigation tasks, autonomous agents such as rats or mobile robots often employ spatial representations. These “maps” can be used for localisation and navigation. We propose a model for spatial learning and navigation based on reinforcement learning. The state space is represented by a population of hippocampal place cells whereas a large number of locomotor neurons in nucleus accumbens forms the action space. Using overlapping receptive fields for both populations, state/action mappings rapidly generalise during learning. The population vector allows a continuous interpretation of both state and action spaces. An eligibility trace is used to propagate reward information back in time. It enables the modification of behaviours for recent states. We propose a biologically plausible mechanism for this trace of events where spike timing dependent plasticity triggers the storing of recent state/action pairs. These pairs, however, are forgotten in the absence of a reward-related signal such as dopamine. The model is validated on a simulated robot platform.
A view of Neural Networks as dynamical systems
- in "International Journal of Bifurcations and Chaos", 2009, http://lanl.arxiv.org/abs/0901.2203
"... We present some recent investigations resulting from the modelling of neural networks as dynamical systems, and dealing with the following questions, adressed in the context of specific models. (i). Characterizing the collective dynamics; (ii). Statistical analysis of spikes trains; (iii). Interplay ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
We present some recent investigations resulting from the modelling of neural networks as dynamical systems, and dealing with the following questions, adressed in the context of specific models. (i). Characterizing the collective dynamics; (ii). Statistical analysis of spikes trains; (iii). Interplay between dynamics and network structure; (iv). Effects of synaptic plasticity.
Does Morpholog Influence Temporal Plasticity?
- ICANN 2002, VOLUME LNCS 2415
, 2002
"... Applying bounded weight-independent temporal plasticity rule tosynapse from inde ede t Poisson firingpre5(W)kx5 ne5( onto aconductanceW5e intexk(P5Z.fi)xW5 nexk lex to a bimodal distribution of synaptic stretic (Songe al., 2000). We e(k#- this mo de to invefiP..P5 the ee.. ofsprek#fi5 the synapse ov ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Applying bounded weight-independent temporal plasticity rule tosynapse from inde ede t Poisson firingpre5(W)kx5 ne5( onto aconductanceW5e intexk(P5Z.fi)xW5 nexk lex to a bimodal distribution of synaptic stretic (Songe al., 2000). We e(k#- this mo de to invefiP..P5 the ee.. ofsprek#fi5 the synapse ove rthe de dritictrei The reic5 sugge5 that distalsynapse tea tolose out to proximalone inthe compemp5fiW for synaptic stretic5 Againstea eain tions, vePk(Px ofthe plasticityrule with a smoothe transition be twe. pote tiation and defi(-(.5Z make little di#ee)fi to the distribution or leads to all synapses losing.