Results 1  10
of
43
Temporal sequence learning, prediction and control  a review of different models and their relation to biological mechanisms
 Neural Computation
, 2004
"... In this article we compare methods for temporal sequence learning (TSL) across the disciplines machinecontrol, classical conditioning, neuronal models for TSL as well as spiketiming dependent plasticity. This review will briefly introduce the most influential models and focus on two questions: 1) T ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
(Show Context)
In this article we compare methods for temporal sequence learning (TSL) across the disciplines machinecontrol, classical conditioning, neuronal models for TSL as well as spiketiming dependent plasticity. This review will briefly introduce the most influential models and focus on two questions: 1) To what degree are rewardbased (e.g. TDlearning) and correlation based (hebbian) learning related? and 2) How do the different models correspond to possibly underlying biological mechanisms of synaptic plasticity? We will first compare the different models in an openloop condition, where behavioral feedback does not alter the learning. Here we observe, that rewardbased and correlation based learning are indeed very similar. Machinecontrol is then used to introduce the problem of closedloop control (e.g. “actorcritic architectures”). Here the problem of evaluative (“rewards”) versus nonevaluative (“correlations”) feedback from the environment will be discussed showing that both learning approaches are fundamentally different in the closedloop condition. In trying to answer the second question we will compare neuronal versions of the different learning architectures to the anatomy of the involved brain structures (basalganglia, thalamus and
Reinforcement learning through modulation of spiketimingdependent synaptic plasticity
 Neural Computation
, 2007
"... The persistent modification of synaptic efficacy as a function of the relative timing of pre and postsynaptic spikes is a phenomenon known as spiketimingdependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
(Show Context)
The persistent modification of synaptic efficacy as a function of the relative timing of pre and postsynaptic spikes is a phenomenon known as spiketimingdependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving rewardmodulated spiketimingdependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic Spike Response Model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrateandfire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), while the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing rate pattern. These learning rules are biologicallyplausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of rewardmodulated
Optimal SpikeTimingDependent Plasticity for Precise Action Potential Firing in Supervised Learning
, 2006
"... In timingbased neural codes, neurons have to emit action potentials at precise moments in time. We use a supervised learning paradigm to derive a synaptic update rule that optimizes by gradient ascent the likelihood of postsynaptic firing at one or several desired firing times. We find that the opt ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
In timingbased neural codes, neurons have to emit action potentials at precise moments in time. We use a supervised learning paradigm to derive a synaptic update rule that optimizes by gradient ascent the likelihood of postsynaptic firing at one or several desired firing times. We find that the optimal strategy of up and downregulating synaptic efficacies depends on the relative timing between presynaptic spike arrival and desired postsynaptic firing. If the presynaptic spike arrives before the desired postsynaptic spike timing, our optimal learning rule predicts that the synapse should become potentiated. The dependence of the potentiation on spike timing directly reflects the time course of an excitatory postsynaptic potential. However, our approach gives no unique reason for synaptic depression under reversed spike timing. In fact, the presence and amplitude of depression of synaptic efficacies for reversed spike timing depend on how constraints are implemented in the optimization problem. Two different constraints, control of postsynaptic rates and control of temporal locality, are studied. The relation of our results to spiketimingdependent plasticity and reinforcement learning is discussed.
Isotropic Sequence Order Learning
, 2003
"... In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according ..."
Abstract

Cited by 24 (15 self)
 Add to MetaCart
(Show Context)
In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according to the correlation of bandpassfiltered inputs with the derivative of the output. We investigate the algorithm in an open and a closedloop condition, the latter being defined by embedding the learning system into a behavioral feedback loop. In the openloop condition, we find that the linear structure of the algorithm allows analytically calculating the shape of the weight change, which is strictly heterosynaptic and follows the shape of the weight change curves found in spiketimedependent plasticity. Furthermore, we show that synaptic weights stabilize automatically when no more temporal differences exist between the inputs without additional normalizing measures. In the second part of this study, the algorithm is is placed in an environment that leads to closed sensormotor loop. To this end, a robot is programmed with a prewired retraction reflex reaction in response to collisions. Through isotropic sequence order (ISO) learning, the robot achieves collision avoidance by learning the correlation between his early rangefinder signals and the later occurring collision signal. Synaptic weights stabilize at the end of learning as theoretically predicted. Finally, we discuss the relation of ISO learning with other drive reinforcement models and with the commonly used temporal difference learning algorithm. This study is followed up by a mathematical analysis of the closedloop situation in the companion article in this issue, “ISO Learning Approximates a Solution to the InverseController Problem in an Unsupervised Behavioral Paradigm” (pp. 865–884).
Reinforcement Learning, Spike Time Dependent Plasticity and the BCM Rule
, 2006
"... Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plasticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We apply a r ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
(Show Context)
Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plasticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We apply a recently introduced policy learning algorithm from Machine Learning to networks of spiking neurons, and derive a spike time dependent plasticity rule which ensures convergence to a local optimum of the expected average reward. The approach is applicable to a broad class of neuronal models, including the HodgkinHuxley model. We demonstrate the effectiveness of the derived rule in several toy problems. Finally, through statistical analysis we show that the synaptic plasticity rule established is closely related to the widely used BCM rule, for which good biological evidence exists. 1 Policy Learning and Neuronal Dynamics Reinforcement Learning (RL) is a general term used for a class of learning problems in which an agent attempts to improve its performance over time at a given task (e.g., Bertsekas and Tsitsiklis, 1996; Sutton and Barto, 1998). Formally, it is the problem of mapping situations to actions in order to maximize a given reward signal. The interaction between the agent and the environment is modelled mathematically as a Partially Ob
Spike timingdependent plasticity: The relationship to ratebased learning for models with weight dynamics determined by a stable fixedpoint
 Neural Computation
, 2004
"... to ratebased learning for models with weight dynamics determined by a stable fixedpoint ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
to ratebased learning for models with weight dynamics determined by a stable fixedpoint
SpikeTiming Dependent Plasticity for Evolved Robots
, 2003
"... Plastic spiking neural networks are synthesized for phototactic robots using evolutionary techniques. Synaptic plasticity asymmetrically depends on the precise relative timing between presynaptic and postsynaptic spikes at the millisecond range and on longerterm activitydependent regulatory sca ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Plastic spiking neural networks are synthesized for phototactic robots using evolutionary techniques. Synaptic plasticity asymmetrically depends on the precise relative timing between presynaptic and postsynaptic spikes at the millisecond range and on longerterm activitydependent regulatory scaling. Comparative studies have been carried out for dierent kinds of plastic neural networks with low and high level of neural noise. In all cases, the evolved controllers are highly robust against internal synaptic decay and other perturbations.
Reinforcement learning in continuous state and action space
, 2003
"... To solve complex navigation tasks, autonomous agents such as rats or mobile robots often employ spatial representations. These “maps” can be used for localisation and navigation. We propose a model for spatial learning and navigation based on reinforcement learning. The state space is represented b ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
To solve complex navigation tasks, autonomous agents such as rats or mobile robots often employ spatial representations. These “maps” can be used for localisation and navigation. We propose a model for spatial learning and navigation based on reinforcement learning. The state space is represented by a population of hippocampal place cells whereas a large number of locomotor neurons in nucleus accumbens forms the action space. Using overlapping receptive fields for both populations, state/action mappings rapidly generalise during learning. The population vector allows a continuous interpretation of both state and action spaces. An eligibility trace is used to propagate reward information back in time. It enables the modification of behaviours for recent states. We propose a biologically plausible mechanism for this trace of events where spike timing dependent plasticity triggers the storing of recent state/action pairs. These pairs, however, are forgotten in the absence of a rewardrelated signal such as dopamine. The model is validated on a simulated robot platform.
A view of Neural Networks as dynamical systems
 in "International Journal of Bifurcations and Chaos", 2009, http://lanl.arxiv.org/abs/0901.2203
"... We present some recent investigations resulting from the modelling of neural networks as dynamical systems, and dealing with the following questions, adressed in the context of specific models. (i). Characterizing the collective dynamics; (ii). Statistical analysis of spikes trains; (iii). Interplay ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
We present some recent investigations resulting from the modelling of neural networks as dynamical systems, and dealing with the following questions, adressed in the context of specific models. (i). Characterizing the collective dynamics; (ii). Statistical analysis of spikes trains; (iii). Interplay between dynamics and network structure; (iv). Effects of synaptic plasticity.
Does Morpholog Influence Temporal Plasticity?
 ICANN 2002, VOLUME LNCS 2415
, 2002
"... Applying bounded weightindependent temporal plasticity rule tosynapse from inde ede t Poisson firingpre5(W)kx5 ne5( onto aconductanceW5e intexk(P5Z.fi)xW5 nexk lex to a bimodal distribution of synaptic stretic (Songe al., 2000). We e(k# this mo de to invefiP..P5 the ee.. ofsprek#fi5 the synapse ov ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Applying bounded weightindependent temporal plasticity rule tosynapse from inde ede t Poisson firingpre5(W)kx5 ne5( onto aconductanceW5e intexk(P5Z.fi)xW5 nexk lex to a bimodal distribution of synaptic stretic (Songe al., 2000). We e(k# this mo de to invefiP..P5 the ee.. ofsprek#fi5 the synapse ove rthe de dritictrei The reic5 sugge5 that distalsynapse tea tolose out to proximalone inthe compemp5fiW for synaptic stretic5 Againstea eain tions, vePk(Px ofthe plasticityrule with a smoothe transition be twe. pote tiation and defi((.5Z make little di#ee)fi to the distribution or leads to all synapses losing.