• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Reconciling reinforcement learning models with behavioral extinction and renewal: Implications for addiction, relapse, and problem gambling. (2007)

by A D Redish, S Jensen, A Johnson, Z Kurth-Nelson
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 41
Next 10 →

Reinforcement learning in the brain

by Yael Niv
"... Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computation ..."
Abstract - Cited by 48 (6 self) - Add to MetaCart
Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and
(Show Context)

Citation Context

...egative prediction errors result in the inference of a new state S and new learning about this state, explained how both the original conditioned values and the new extinction knowledge can co-exist (=-=Redish et al., 2007-=-). Further modeling work awaits. 4 Conclusions To summarize, computational models of learning have done much to advance our understanding of decision making in the last couple of decades. Temporal dif...

Short-term gains, long-term pains: How cues about state aid learning in dynamic environments

by Todd M. Gureckis, Bradley C. Love , 2009
"... Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out be ..."
Abstract - Cited by 31 (6 self) - Add to MetaCart
Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out best in the long run. In this paper, we explore human learning in a dynamic decision-making task which places short- and long-term rewards in conflict. Our goal in these studies was to evaluate how people’s mental representation of a task affects their ability to discover an optimal decision strategy. We find that perceptual cues that readily align with the underlying state of the task environment help people overcome the impulsive appeal of short-term rewards. Our experimental manipulations, predictions, and analyses are motivated by current work in reinforcement learning which details how learners value delayed outcomes in sequential tasks and the importance that “state” identification plays in effective learning.
(Show Context)

Citation Context

... to participant’s trial-by-trial choices). On the theoretical side, our results provide a first step in linking work in category learning and generalization into models of sequential choice (see also =-=Redish, Jensen, Johnson, & Kurth-Nelson, 2007-=- for recent work on this issue). In our simulations, we showed how cues that allowed extrapolation or generalization of the experience in one part of the state space to others could improve performanc...

Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System

by Elliot A. Ludvig, Richard S. Sutton, E. James Kehoe , 2008
"... The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each mome ..."
Abstract - Cited by 11 (2 self) - Add to MetaCart
The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.

A Neurocomputational Account of Catalepsy Sensitization Induced by D2-Receptor-Blockade in Rats: Context-Dependency, Extinction and Renewal

by Thomas V. Wiecki, Katrin Riedinger, Andreas Ameln-mayerhofer, Werner J. Schmidt, Michael J. Frank, W. J. Schmidt Deceased
"... Rationale: Repeated haloperidol treatment in rodents results in a day-to-day intensification of catalepsy (i.e., sensitization). Prior experiments suggest that this sensitization is context dependent and resistant to extinction training. Objectives: To provide a neurobiological mechanistic explanati ..."
Abstract - Cited by 9 (7 self) - Add to MetaCart
Rationale: Repeated haloperidol treatment in rodents results in a day-to-day intensification of catalepsy (i.e., sensitization). Prior experiments suggest that this sensitization is context dependent and resistant to extinction training. Objectives: To provide a neurobiological mechanistic explanation for these findings. Methods: We use a neurocomputational model of the basal ganglia, and simulate two alternative models based on the reward prediction error and novelty hypotheses of dopamine function. We also conducted a behavioral rat experiment to adjudicate between these models. 20 male Sprague-Dawley rats were challenged with 0.25 mg kg haloperidol across multiple days, and were subsequently tested in either a familiar or novel context. Results: Simulation results show that catalepsy sensitization, and its context dependency, can be explained by “NoGo ” learning via simulated D2 receptor antagonism in striatopallidal neurons, leading to increasingly slowed response latencies. The model further exhibits a non-extinguishable component of catalepsy sensitization, due to latent NoGo representations that are prevented from being expressed, and therefore from being unlearned, during extinction. In the rat experiment, context dependency effects were not dependent on the novelty of the context, ruling out the novelty model’s account of context-dependency. Conclusions: Simulations lend insight into potential complex mechanisms leading to context-dependent catalepsy sensitization, extinction, and renewal.
(Show Context)

Citation Context

...ive to animals who had not been previously sensitized (Amtage & Schmidt, 2003) (see Fig. 2). Similar sensitization, extinction, and renewal phenomena are observed in response to drugs of abuse (e.g., =-=Redish, Jensen, Johnson, & Kurth-Nelson, 2007-=-, ), raising the question of whether similar principles apply. Despite several years of research into these phenomena, a well elaborated mechanistic explanation for these observations is still lacking...

A computational model of how cholinergic interneurons protect striatal-dependent learning

by F Gregory Ashby , Matthew J Crossley - Journal of Cognitive Neuroscience , 2011
"... Abstract ■ An essential component of skill acquisition is learning the environmental conditions in which that skill is relevant. This article proposes and tests a neurobiologically detailed theory of how such learning is mediated. The theory assumes that a key component of this learning is provided ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
Abstract ■ An essential component of skill acquisition is learning the environmental conditions in which that skill is relevant. This article proposes and tests a neurobiologically detailed theory of how such learning is mediated. The theory assumes that a key component of this learning is provided by the cholinergic interneurons in the striatum known as tonically active neurons (TANs). The TANs are assumed to exert a tonic inhibitory influence over cortical inputs to the striatum that prevents the execution of any striatal-dependent actions. The TANs learn to pause in rewarding environments, and this pause releases the striatal output neurons from this inhibitory effect, thereby facilitating the learning and expression of striataldependent behaviors. When rewards are no longer available, the TANs cease to pause, which protects striatal learning from decay. A computational version of this theory accounts for a variety of single-cell recording data and some classic behavioral phenomena, including fast reacquisition after extinction. ■
(Show Context)

Citation Context

...t reacquisition is considerably faster than the initial learning of the behavior. This is one of the most widely known results in the conditioning literature. It is famous because it is a ubiquitous empirical phenomenon that is seen in almost all conditioning– extinction–reacquisition experiments (for an exception, see Ricker & Bouton, 1996) and because it has posed a difficult challenge for learning theories. For example, fast reacquisition has long been known to disconfirm any theory that assumes learning is purely a process of strengthening associations between stimuli and responses (e.g., Redish, Jensen, Johnson, & Kurth-Nelson, 2007). In such models, response rate is completely determined by the strength of these associations. Conditioning increases the strength from some initial value, and extinction decreases it back to its starting point (if the extinction phase is long enough). Thus, at the beginning of the reacquisition phase, the strength of the stimulus–response association is the same as at the beginning of the initial conditioning phase. As a result, relearning must follow the same course as initial learning. The bottom panel of Figure 7 shows how the model accounts for fast reacquisition. This graph shows the s...

Context, Learning, and Extinction

by Samuel J. Gershman, David M. Blei, Yael Niv
"... A. Redish et al. (2007) proposed a reinforcement learning model of context-dependent learning and extinction in conditioning experiments, using the idea of “state classification ” to categorize new observations into states. In the current article, the authors propose an interpretation of this idea i ..."
Abstract - Cited by 7 (4 self) - Add to MetaCart
A. Redish et al. (2007) proposed a reinforcement learning model of context-dependent learning and extinction in conditioning experiments, using the idea of “state classification ” to categorize new observations into states. In the current article, the authors propose an interpretation of this idea in terms of normative statistical inference. They focus on renewal and latent inhibition, 2 conditioning paradigms in which contextual manipulations have been studied extensively, and show that online Bayesian inference within a model that assumes an unbounded number of latent causes can characterize a diverse set of behavioral results from such manipulations, some of which pose problems for the model of Redish et al. Moreover, in both paradigms, context dependence is absent in younger animals, or if hippocampal lesions are made prior to training. The authors suggest an explanation in terms of a restricted capacity to infer new causes.

BOOK REVIEW

by W. Rocky Newman, Glen Corlett, Charles A. Sullivan, Scott A. Dellana, Richard D. Hauser, Gyu C. Kim, Marc J. Schniederjans, Bradley J. Sleeper, Robert J. Walter, Robert J. Calhoun, David N. Hurtt, Jerry G. Kreuze, Sheldon A. Langsam, Jack T. Marchewka, Lynn Neeley, Jane Z. Sojka, Ashok K. Gupta, Timothy P. Hartman, Brian Laverty, Mid-american Journal Business, Neil A. Palomba, John Schleede Dean, Daniel Short Dean, David Graf Dean, Glenn Corlett Dean, James Pope Dean, James Schmotter Dean, Shaheen Borna, Larry Compeau
"... mid-american ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
mid-american
(Show Context)

Citation Context

...sponding to a stimulus that has previously been a consistent predictor of a salient outcome (Lovibond, 2004). Prediction error is also central to extinction (Pedreira et al., 2004). A recent account (=-=Redish et al., 2007-=-) suggests that negative prediction error (a reduction in baseline firing rate of prediction error coding neurons) leads the organism to categorize the extinction situation as different from the origi...

Erasing the Engram: The Unlearning of Procedural Skills

by Matthew J. Crossley, F. Gregory Ashby, W. Todd Maddox
"... This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Huge amounts of money are spent every year on unlearning programs—in drug-treatme ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Huge amounts of money are spent every year on unlearning programs—in drug-treatment facilities, prisons, psychotherapy clinics, and schools. Yet almost all of these programs fail, since recidivism rates are high in each of these fields. Progress on this problem requires a better understanding of the mechanisms that make unlearning so difficult. Much cognitive neuroscience evidence suggests that an important component of these mechanisms also dictates success on categorization tasks that recruit procedural learning and depend on synaptic plasticity within the striatum. A biologically detailed computational model of this striatal-dependent learning is described (based on Ashby & Crossley, 2011). The model assumes that a key component of striatal-dependent learning is provided by interneurons in the striatum called the tonically active neurons (TANs), which act as a gate for the learning and expression of striatal-dependent behaviors. In their tonically active state, the TANs prevent the expression of any striatal-dependent behavior. However, they learn to pause in rewarding environments and thereby permit the learning and expression of striatal-dependent behaviors. The model predicts that when rewards are no longer contingent on behavior, the TANs cease to pause, which protects striatal learning from decay and prevents unlearning. In addition, the model predicts that when rewards are partially contingent on behavior, the TANs remain partially paused, leaving the striatum available for unlearning. The results from 3 human behavioral studies support the model predictions and suggest a novel unlearning protocol that shows promising initial signs of success.
(Show Context)

Citation Context

...e for learning theories in general. For example, fast reacquisition disconfirms any theory that assumes learning is purely a process of strengthening associations between stimuli and responses (e.g., =-=Redish, Jensen, Johnson, & Kurth-Nelson, 2007-=-). Partly for this reason, some conditioning researchers have proposed that extinction is not a process of unlearning but rather a process of new learning (Bouton, 2004; Rescorla, 2001). In particular...

Melioration as Rational Choice: Sequential Decision Making in Uncertain Environments

by Chris R. Sims, Hansjörg Neth, Robert A. Jacobs, Wayne D. Gray
"... Melioration—defined as choosing a lesser, local gain over a greater longer term gain—is a behavioral tendency that people and pigeons share. As such, the empirical occurrence of meliorating behavior has frequently been interpreted as evidence that the mechanisms of human choice violate the norms of ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Melioration—defined as choosing a lesser, local gain over a greater longer term gain—is a behavioral tendency that people and pigeons share. As such, the empirical occurrence of meliorating behavior has frequently been interpreted as evidence that the mechanisms of human choice violate the norms of economic rationality. In some environments, the relationship between actions and outcomes is known. In this case, the rationality of choice behavior can be evaluated in terms of how successfully it maximizes utility given knowledge of the environmental contingencies. In most complex environments, however, the relationship between actions and future outcomes is uncertain and must be learned from experience. When the difficulty of this learning challenge is taken into account, it is not evident that melioration represents suboptimal choice behavior. In the present article, we examine human performance in a sequential decision-making experiment that is known to induce meliorating behavior. In keeping with previous results using this paradigm, we find that the majority of participants in the experiment fail to adopt the optimal decision strategy and instead demonstrate a significant bias toward melioration. To explore the origins of this behavior, we develop a rational analysis (Anderson, 1990) of the learning problem facing individuals in uncertain decision environments. Our analysis demonstrates that an unbiased learner would adopt melioration as the optimal response strategy for maximizing long-term gain. We suggest that many documented cases of melioration can be reinterpreted not as irrational choice but rather as globally optimal choice under uncertainty.

Pathological gambling and the loss of willpower: A neurocognitive perspective

by Damien Brevers - Socioaffective Neurosci. Psychol
"... The purpose of this review is to gain more insight on the neurocognitive processes involved in the maintenance of pathological gambling. Firstly, we describe structural factors of gambling games that could promote the repetition of gambling experiences to such an extent that some individuals may bec ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
The purpose of this review is to gain more insight on the neurocognitive processes involved in the maintenance of pathological gambling. Firstly, we describe structural factors of gambling games that could promote the repetition of gambling experiences to such an extent that some individuals may become unable to control their gambling habits. Secondly, we review findings of neurocognitive studies on pathological gambling. As a whole, poor ability to resist gambling is a product of an imbalance between any one or a combination of three key neural systems: (1) an hyperactive ‘impulsive ’ system, which is fast, automatic, and unconscious and promotes automatic and habitual actions; (2) a hypoactive ‘reflective ’ system, which is slow and deliberative, forecasting the future consequences of a behavior, inhibitory control, and self-awareness; and (3) the interoceptive system, translating bottom-up somatic signals into a subjective state of craving, which in turn potentiates the activity of the impulsive system, and/or weakens or hijacks the goal-driven cognitive resources needed for the normal operation of the reflective system. Based on this theoretical background, we focus on certain clinical interventions that could reduce the risks of both gambling addiction and relapse.
(Show Context)

Citation Context

...with a very big gain (or a statistically unlikely sequence of wins), the memory of this ‘positive surprise’ or ‘big unexpected strike’ persists and can drive betting behavior despite repeated losses (=-=Redish et al., 2007-=-). As a result, pauses in reward acquisition in gambling fail to extinguish gambling action as they would extinguish most learned responses (Redish et al., 2007). ‘Near-miss’ In gambling, a near-miss ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University