Results 1 - 10
of
41
Reinforcement learning in the brain
"... Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computation ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
(Show Context)
Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and
Short-term gains, long-term pains: How cues about state aid learning in dynamic environments
, 2009
"... Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out be ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
(Show Context)
Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out best in the long run. In this paper, we explore human learning in a dynamic decision-making task which places short- and long-term rewards in conflict. Our goal in these studies was to evaluate how people’s mental representation of a task affects their ability to discover an optimal decision strategy. We find that perceptual cues that readily align with the underlying state of the task environment help people overcome the impulsive appeal of short-term rewards. Our experimental manipulations, predictions, and analyses are motivated by current work in reinforcement learning which details how learners value delayed outcomes in sequential tasks and the importance that “state” identification plays in effective learning.
Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System
, 2008
"... The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each mome ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.
A Neurocomputational Account of Catalepsy Sensitization Induced by D2-Receptor-Blockade in Rats: Context-Dependency, Extinction and Renewal
"... Rationale: Repeated haloperidol treatment in rodents results in a day-to-day intensification of catalepsy (i.e., sensitization). Prior experiments suggest that this sensitization is context dependent and resistant to extinction training. Objectives: To provide a neurobiological mechanistic explanati ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
(Show Context)
Rationale: Repeated haloperidol treatment in rodents results in a day-to-day intensification of catalepsy (i.e., sensitization). Prior experiments suggest that this sensitization is context dependent and resistant to extinction training. Objectives: To provide a neurobiological mechanistic explanation for these findings. Methods: We use a neurocomputational model of the basal ganglia, and simulate two alternative models based on the reward prediction error and novelty hypotheses of dopamine function. We also conducted a behavioral rat experiment to adjudicate between these models. 20 male Sprague-Dawley rats were challenged with 0.25 mg kg haloperidol across multiple days, and were subsequently tested in either a familiar or novel context. Results: Simulation results show that catalepsy sensitization, and its context dependency, can be explained by “NoGo ” learning via simulated D2 receptor antagonism in striatopallidal neurons, leading to increasingly slowed response latencies. The model further exhibits a non-extinguishable component of catalepsy sensitization, due to latent NoGo representations that are prevented from being expressed, and therefore from being unlearned, during extinction. In the rat experiment, context dependency effects were not dependent on the novelty of the context, ruling out the novelty model’s account of context-dependency. Conclusions: Simulations lend insight into potential complex mechanisms leading to context-dependent catalepsy sensitization, extinction, and renewal.
A computational model of how cholinergic interneurons protect striatal-dependent learning
- Journal of Cognitive Neuroscience
, 2011
"... Abstract ■ An essential component of skill acquisition is learning the environmental conditions in which that skill is relevant. This article proposes and tests a neurobiologically detailed theory of how such learning is mediated. The theory assumes that a key component of this learning is provided ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract ■ An essential component of skill acquisition is learning the environmental conditions in which that skill is relevant. This article proposes and tests a neurobiologically detailed theory of how such learning is mediated. The theory assumes that a key component of this learning is provided by the cholinergic interneurons in the striatum known as tonically active neurons (TANs). The TANs are assumed to exert a tonic inhibitory influence over cortical inputs to the striatum that prevents the execution of any striatal-dependent actions. The TANs learn to pause in rewarding environments, and this pause releases the striatal output neurons from this inhibitory effect, thereby facilitating the learning and expression of striataldependent behaviors. When rewards are no longer available, the TANs cease to pause, which protects striatal learning from decay. A computational version of this theory accounts for a variety of single-cell recording data and some classic behavioral phenomena, including fast reacquisition after extinction. ■
Context, Learning, and Extinction
"... A. Redish et al. (2007) proposed a reinforcement learning model of context-dependent learning and extinction in conditioning experiments, using the idea of “state classification ” to categorize new observations into states. In the current article, the authors propose an interpretation of this idea i ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
A. Redish et al. (2007) proposed a reinforcement learning model of context-dependent learning and extinction in conditioning experiments, using the idea of “state classification ” to categorize new observations into states. In the current article, the authors propose an interpretation of this idea in terms of normative statistical inference. They focus on renewal and latent inhibition, 2 conditioning paradigms in which contextual manipulations have been studied extensively, and show that online Bayesian inference within a model that assumes an unbounded number of latent causes can characterize a diverse set of behavioral results from such manipulations, some of which pose problems for the model of Redish et al. Moreover, in both paradigms, context dependence is absent in younger animals, or if hippocampal lesions are made prior to training. The authors suggest an explanation in terms of a restricted capacity to infer new causes.
Erasing the Engram: The Unlearning of Procedural Skills
"... This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Huge amounts of money are spent every year on unlearning programs—in drug-treatme ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Huge amounts of money are spent every year on unlearning programs—in drug-treatment facilities, prisons, psychotherapy clinics, and schools. Yet almost all of these programs fail, since recidivism rates are high in each of these fields. Progress on this problem requires a better understanding of the mechanisms that make unlearning so difficult. Much cognitive neuroscience evidence suggests that an important component of these mechanisms also dictates success on categorization tasks that recruit procedural learning and depend on synaptic plasticity within the striatum. A biologically detailed computational model of this striatal-dependent learning is described (based on Ashby & Crossley, 2011). The model assumes that a key component of striatal-dependent learning is provided by interneurons in the striatum called the tonically active neurons (TANs), which act as a gate for the learning and expression of striatal-dependent behaviors. In their tonically active state, the TANs prevent the expression of any striatal-dependent behavior. However, they learn to pause in rewarding environments and thereby permit the learning and expression of striatal-dependent behaviors. The model predicts that when rewards are no longer contingent on behavior, the TANs cease to pause, which protects striatal learning from decay and prevents unlearning. In addition, the model predicts that when rewards are partially contingent on behavior, the TANs remain partially paused, leaving the striatum available for unlearning. The results from 3 human behavioral studies support the model predictions and suggest a novel unlearning protocol that shows promising initial signs of success.
Melioration as Rational Choice: Sequential Decision Making in Uncertain Environments
"... Melioration—defined as choosing a lesser, local gain over a greater longer term gain—is a behavioral tendency that people and pigeons share. As such, the empirical occurrence of meliorating behavior has frequently been interpreted as evidence that the mechanisms of human choice violate the norms of ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Melioration—defined as choosing a lesser, local gain over a greater longer term gain—is a behavioral tendency that people and pigeons share. As such, the empirical occurrence of meliorating behavior has frequently been interpreted as evidence that the mechanisms of human choice violate the norms of economic rationality. In some environments, the relationship between actions and outcomes is known. In this case, the rationality of choice behavior can be evaluated in terms of how successfully it maximizes utility given knowledge of the environmental contingencies. In most complex environments, however, the relationship between actions and future outcomes is uncertain and must be learned from experience. When the difficulty of this learning challenge is taken into account, it is not evident that melioration represents suboptimal choice behavior. In the present article, we examine human performance in a sequential decision-making experiment that is known to induce meliorating behavior. In keeping with previous results using this paradigm, we find that the majority of participants in the experiment fail to adopt the optimal decision strategy and instead demonstrate a significant bias toward melioration. To explore the origins of this behavior, we develop a rational analysis (Anderson, 1990) of the learning problem facing individuals in uncertain decision environments. Our analysis demonstrates that an unbiased learner would adopt melioration as the optimal response strategy for maximizing long-term gain. We suggest that many documented cases of melioration can be reinterpreted not as irrational choice but rather as globally optimal choice under uncertainty.
Pathological gambling and the loss of willpower: A neurocognitive perspective
- Socioaffective Neurosci. Psychol
"... The purpose of this review is to gain more insight on the neurocognitive processes involved in the maintenance of pathological gambling. Firstly, we describe structural factors of gambling games that could promote the repetition of gambling experiences to such an extent that some individuals may bec ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
The purpose of this review is to gain more insight on the neurocognitive processes involved in the maintenance of pathological gambling. Firstly, we describe structural factors of gambling games that could promote the repetition of gambling experiences to such an extent that some individuals may become unable to control their gambling habits. Secondly, we review findings of neurocognitive studies on pathological gambling. As a whole, poor ability to resist gambling is a product of an imbalance between any one or a combination of three key neural systems: (1) an hyperactive ‘impulsive ’ system, which is fast, automatic, and unconscious and promotes automatic and habitual actions; (2) a hypoactive ‘reflective ’ system, which is slow and deliberative, forecasting the future consequences of a behavior, inhibitory control, and self-awareness; and (3) the interoceptive system, translating bottom-up somatic signals into a subjective state of craving, which in turn potentiates the activity of the impulsive system, and/or weakens or hijacks the goal-driven cognitive resources needed for the normal operation of the reflective system. Based on this theoretical background, we focus on certain clinical interventions that could reduce the risks of both gambling addiction and relapse.