48 citations found. Retrieving documents...
Schultz, W., Dayan, P., Montague, P.R. (1997) A Neural substrate of prediction and reward. Science, 275: 5306, 1593-1599.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Proceedings of the Second International Conference on.. - Dopamine And Inference   (Correct)

....altogether. The dopamine signal is modeled as a TD error signal for learning to predict future rewards from this inferred state representation. 1 Introduction Several investigators have suggested that the primate dopamine system carries an error signal for learning to predict future rewards [1, 2, 3]. These models, based on temporal di#erence (TD) learning [4] explain most phasic responses of primate dopamine neurons in appetitive conditioning [5] moreover, they suggest a neurophysiological account of animal conditioning behavior. But because existing models are based in the simple formal ....

....a sample of the right side of equation 2 can be computed, and the estimate V (s t ) can be updated in its direction. The change in V (s t ) is proportional to the di#erence between the approximated left and right sides of equation 2: # t = r t # V (s t 1 ) V (s t ) 3) The TD models [1, 2, 3] propose that the activity of dopamine neurons reflects this error signal # t . This paper considers TD algorithms for a richer setting, a partially observable semi Markov process. The ISI ITI reward ITI . stimulus ISI stimulus reward (a) b) Figure 1: Models of a conditioning task. a) ....

[Article contains additional citation context not shown here]

W. Schultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science, vol. 275, pp. 1593--1599, 1997.


Control of Exploitation-Exploration Meta-Parameter in.. - Ishii, Yoshida.. (2002)   (1 citation)  (Correct)

....adopts that approximation. In addition, we also use a formulation of partially observable Markov decision process (POMDP) A POMDP assumes that the environment involves unobservable information, typically, unobservable state variables. Although RL is a machine learning framework, recent studies [46, 62] showed that in a real brain a dopaminergic system including the basal ganglia and the frontal cortex seems to realize a similar learning scheme. Doya [16] has suggested that parameters used in RL, which are called meta parameters , may correspond to neuromodulators such as serotonin, ....

....substantia nigra have long been engaged on the processing of reward stimuli. Recording studies using alert monkeys showed that DA neurons in the VTA were activated by stimuli associated with reward prediction and it was suggested that the neurons represent error in the prediction of future reward [46]. Since DA neurons were also activated when provided novel and salient stimuli [47] signals transferred by DA neurons may be modified by the novelty bonus information. DA neurons receive massive inputs from amygdala [21] that responds to primary rewards and reward predicting stimuli. In ....

Schultz, W., Dayan, P., and Montague, R.P. (1997). A neural substrate of prediction and reward. Science, 275, 1593-1599.


Timing and Partial Observability in the Dopamine System - Daw, Courville, Touretzky (2003)   (Correct)

....of temporal variability. By combining statistical model based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models. 1 Introduction A series of models [1, 2, 3, 4, 5] based on temporal difference (TD) learning [6] has explained most responses of primate dopamine (DA) neurons during conditioning [7] as an error signal for predicting reward, and has also identified the DA system as a substrate for conditioning behavior [8] We address a troublesome issue from ....

....(e,f) Modeled DA activity when reward timing varies uniformly over a range. The tapped delay line model (e) incorrectly predicts identical excitation to rewards delivered at all times, while, in accord with experiment, our model (f) predicts a response that declines with delay. Several models [1, 2, 3, 4, 5] identify the firing of DA neurons with the reward prediction error signal # t of a TD algorithm [6] In the models, DA neurons are excited by positive error in reward prediction (caused by unexpected rewards or reward predicting stimuli) and inhibited by negative prediction error (caused by the ....

[Article contains additional citation context not shown here]

W Schultz, P Dayan, and PR Montague. A neural substrate of prediction and reward. Science, 275:1593--1599, 1997.


Combining Configural and TD Learning on a Robot - Touretzky, Daw, Tira-Thompson (2002)   (Correct)

.... a network to predict V (t) is simply the di#erence between the left and right sides of the above equation: #(t) #V (t 1) V(t) r(t) The error signal #(t) has attracted considerable interest because it appears to be a good model of the primate dopamine system s response to reward prediction error [5, 6, 7]. This led to the suggestion that V (t) may be computed in the striatum [5] Daw and Touretzky [8] argue that an average reward version of TD suggested by Tsitisklis [9] is preferable to the exponential discounting formulation because it connects more directly with psychological theories of ....

W. Schultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science, vol. 275, pp. 1593--1599, 1997.


Evolution of Reinforcement Learning in Foraging Bees.. - Niv, Joel, Meilijson, ..   (Correct)

....weights come to represent the expected rewards from each ower type. The behavior of the decision unit P in this model is similar to the ring patterns characteristic of dopaminergic neurons in the SNc and VTA of primates as have been recorded in a monkey performing a delayed match to sample task [30, 39], and is thus consistent with the hypothesis that these dopaminergic neurons compute a prediction error [38] While this model replicates Real s foraging results and provides a basic and simple NN architecture to solve RL tasks, many aspects of the model, rst and foremost the handcrafted ....

....which is not dependent on the reward module results in a disruption of the correct predictions and fails to produce the desired RL behavior. Several studies have suggested that dopaminergic neurons in the basal ganglia may constitute a biological implementation of a TD reinforcement signal [28, 30, 39], and together with the striatum may implement an actor critic learning model [20, 37] The output of unit P in our model, as in that of Montague et al. s model [29] quite accurately captures the essence of the activity patterns of midbrain dopaminergic neurons [30, 39] in primates and rodents, ....

[Article contains additional citation context not shown here]

W. Schultz, P. Dayan, and P.R. Montague. A neural substrate of prediction and reward. science, 275:1593-1599, march 1997.


Temporal Hebbian learning in Rate-Coded Neural Networks: A.. - Porr, Wörgötter   (Correct)

.... conditioned stimulus will elicit the same event and, thus, the unconditioned stimulus can be interpreted as a predictor of the conditioned stimulus (the sound of a bell predicts the feeding) So far temporal sequence learning has in general been achieved by the temporal difference (TD) algorithm [7, 8]. This algorithm is discrete in time and amplitude. Instead, in our approach we have chosen a rate code description as the level of abstraction, which allows to treat continuous signals and this has the advantage that we can develop the theory in an analytically solvable way. In order to implement ....

Wolfram Schultz, Peter Dayan, and P. Read Montague. A neural substrate of prediction and reward. Science, 275:1593--1599, 1997.


Learning to Predict Variable-Delay Rewards and Its.. - Pérez-Uribe, Michčle..   (Correct)

....and forget the inadequate actions [12] To whom correspondence should be addressed Neuroscientists have identified a neural substrate of prediction and reward in experiments with primates. The so called dopamine neurons have been shown to code an error in the temporal prediction of rewards [8]. Similarly, artificial systems can learn to predict by the so called temporal difference (TD) methods [12] Temporal difference methods are computational models that have been developed to solve a wide range of reinforcement learning problems including game learning and adaptive control tasks ....

.... a wide range of reinforcement learning problems including game learning and adaptive control tasks [12] The system learns to predict rewards by trial and error, based on the idea that learning is driven by changes in the expectations about future salient events such as rewards and punishments [8]. In this paper, we use a biologically plausible TD learning architecture to learn to predict variable delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. Moreover, we analyze the robustness of the model by considering stimuli with variable ....

[Article contains additional citation context not shown here]

W. Schultz, P. Dayan, and P. Read Montague. A Neural Substrate of Prediction and Reward. Science, 275:1593--1599, 14 March 1997.


Learning and Foraging in Robot-bees - Pérez-Uribe, Hirsbrunner   (Correct)

....robot bee landed on an artificial flower and to 0:0 otherwise. Therefore, weight changes only occurred during encounters with flowers (moreover, it is assumed that V (t) 0 when the robot bee gathers nectar at time t) This corresponds to a dopamine neuron based neuromodulation of plasticity (Schultz et al. 1997). This makes part of our study of a new class of learning algorithms which are being developed by neuroscientists based on experimental work with monkeys. We are particularly interested to use such new algorithms and to test them with autonomous mobile robots given their potential for prediction ....

....different rewarding nectar. In a first set of experiments we used a hebbian learning model with neuromodulatory signals previously developed by neuroscientists. This model is biologically plausible and introduces the promising idea of using dopamine like neural signals that modulate learning (Schultz et al. 1997). We are particularly interested to use such new algorithms and test them with autonomous mobile robots given their potential for prediction and learning of sequential behavior (Suri and Schultz, 1998) The resulting foraging behavior is a biased random walk similar to that observed in bacteria. ....

Schultz, W., Dayan, P., and Montague, P. R. (1997). A Neural Substrate of Prediction and Reward. Science, 275:1593--1599.


Of Implementing Neural Epigenesis, Reinforcement Learning, and.. - Perez-Uribe   (Correct)

.... While in the former case only resonant states can drive new learning (i.e. when the current inputs sufficiently match the system s expectations) Grossberg, 1998) in the latter learning is driven by changes in the expectations about future salient events such as rewards and punishments (Schultz et al. 1997). 2 Our neurocontroller architecture We have developed a neurocontroller architecture (Fig. 1 based on the above premises (environmen tally guided neural circuit building for unsupervised adaptive clustering and trial and error learning of behaviors) and tested it using an autonomous mobile ....

W. Schultz, P. Dayan, and P. Read Montague. A Neural Substrate of Prediction and Reward. Science, 275: 1593--1599, 1997.


Using a Time-Delay Actor-Critic Neural Architecture With.. - Pérez-Uribe   (Correct)

....[20] The use of search and memory gives rise to what is called trial anderror learning. Neuroscientists have identified a neural substrate of prediction and reward in experiments with primates. The so called dopamine neurons have been shown to code an error in the temporal prediction of rewards [15]. Similarly, artificial systems can learn to predict by the so called temporal difference (TD) methods [19] Temporal difference methods are computational models that have been developed to solve a wide range of reinforcement learning problems including game learning and adaptive control tasks ....

....been used to replicate foraging behavior of bumblebees [8] learning of sequential movements [17] etc. The system learns to predict rewards by trial anderror, based on the idea that learning is driven by changes in the expectations about future salient events such as rewards and punishments [15]. Such changes are detected by computing temporal differences of the estimations of rewards. Based on the general resemblance between the TD computation of the changes of expectations about rewards and the response of dopamine neurons, Suri and Schultz [18] developed a particular TD model they ....

[Article contains additional citation context not shown here]

W. Schultz, P. Dayan, and P. Read Montague. A Neural Substrate of Prediction and Reward. Science, 275:1593--1599, 14 March 1997.


A Neural Network Model with Dopamine-Like Reinforcement Signal .. - Suri, Schultz (1999)   (3 citations)  Self-citation (Schultz)   (Correct)

No context found.

Schultz W., Dayan P. and Montague P. R. (1997) A neural substrate of prediction and reward. Science 275, (5306) 1593--1599.


Behavioral considerations suggest an average reward TD model .. - Daw, Touretzky (2000)   Self-citation (Dayan)   (Correct)

....applicability to behavioral data. # 2000 Published by Elsevier Science B.V. All rights reserved. Keywords: Dopamine; Exponential discounting; Temporal di erence learning 1. Introduction Recently there has been much interest in reinforcement learning models of the dopamine system. These models [3,7,10] explain data on primate midbrain dopamine neurons (reviewed in [9] in terms of temporal di erence (TD) learning [11] As TD models have also been applied to animal behavior [12] the TD interpretation of the dopamine system has held out hope for a theory connecting neuronal responses to ....

....models behavioral applicability. 2. TD models of dopamine Recordings by Schultz and collaborators (reviewed in [9] show that dopamine neurons in primate substantia nigra pars compacta (SNc) and ventral tegmental area (VTA) respond to rewards and reward predicting stimuli. Several models [3,7,10] propose that this activity signals reward prediction error, as computed by TD learning [11] TD systems learn to associate states of the world with a value functiona predicting future rewards, and can use these predictions to select actions maximizing the value function. The value function, ....

W. Schultz, P. Dayan, P.R. Montague, A neural substrate of prediction and reward, Science 275 (1997) 1593}1599.


ACh, Uncertainty, and Cortical Inference - Dayan, Yu (2001)   Self-citation (Dayan)   (Correct)

No context found.

Schultz, W, Dayan, P & Montague, PR (1997). A neural substrate of prediction and reward. Science, 275, 1593-1599.


Motivated Reinforcement Learning - Dayan (2001)   (2 citations)  Self-citation (Dayan)   (Correct)

No context found.

Schultz, W, Dayan, P & Montague, PR (1997) A neural substrate of prediction and reward. Science 275:1593-1599.


How Do Computational Models of the Role of Dopamine as a.. - Error Map On   (Correct)

No context found.

Schultz, W., Dayan, P., Montague, P.R. (1997) A Neural substrate of prediction and reward. Science, 275: 5306, 1593-1599.


Timing and Partial Observability in the - Dopamine System Nathaniel   (Correct)

No context found.

W Schultz, P Dayan, and PR Montague. A neural substrate of prediction and reward. Science, 275:1593--1599, 1997.


A Computational Model Of The Emergence Of Gaze Following - Carlson, Triesch (2003)   (1 citation)  (Correct)

No context found.

W. Schultz, P. Dayan, and P. R. Montague. A neural substrate of prediction and reward. Science, 275:1593--1599, 1997.


Reinforcement Learning and Artificial Intelligence - Sutton (2003)   (Correct)

No context found.

Schultz, W., Dayan, P., and Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275:1593--1598.


International Journal of Neural Systems, Vol. 11, No. 2 .. - World Scientific..   (Correct)

No context found.

W. Schultz, P. Dayan and R. R. Montague 1997, "A neural substrate of prediction and reward," Science 275, 1593--1599.


Estimating Internal Variables and Parameters of a.. - Kazuyuki Samejima Kenji (2003)   (Correct)

No context found.

Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science. 14;275(5306):1593-1599


A Theory of Cognitive Control, Aging Cognition, and.. - Todd Braver Deanna (2002)   (Correct)

No context found.

Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science 1997;275:1593 -- 9.


Proceedings of the Second International Conference on.. - Dopamine And Inference   (Correct)

No context found.

W. Schultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science, vol. 275, pp. 1593--1599, 1997.


Matters Temporat - Dayan (2002)   (Correct)

No context found.

Schultz, W. et al. (1997)A neural substrate of prediction and reward. Science 275, 1593--1599


Conclusion - Vi Summary As   (Correct)

No context found.

W. Schultz, P. Dayan, and P. Read Montague. A neural substrate of prediction and reward. Science, 275(5306):1593--1599, 1997.


The Basal Ganglia: A Vertebrate Solution To The Selection .. - Redgrave, Prescott.. (1999)   (5 citations)  (Correct)

No context found.

Schultz W., Dayan P. and Montague P.R. (1997) A neural substrate of prediction and reward. Science 275, 1593-1599.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC