Results 1 - 10
of
96
Maximizing learning progress: an internal reward system for development
- Embodied Artificial Intelligence, LNCS 3139
, 2004
"... Abstract. This chapter presents a generic internal reward system that drives an agent to increase the complexity of its behavior. This reward system does not reinforce a predefined task. Its purpose is to drive the agent to progress in learning given its embodiment and the environment in which it is ..."
Abstract
-
Cited by 42 (7 self)
- Add to MetaCart
(Show Context)
Abstract. This chapter presents a generic internal reward system that drives an agent to increase the complexity of its behavior. This reward system does not reinforce a predefined task. Its purpose is to drive the agent to progress in learning given its embodiment and the environment in which it is placed. The dynamics created by such a system are studied first in a simple environment and then in the context of active vision. 1
Reconciling reinforcement learning models with behavioral extinction and renewal: Implications for addiction, relapse, and problem gambling
- Psychological Review
, 2007
"... Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated val ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
(Show Context)
Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL models are based on the hypothesis that dopamine carries a reward prediction error signal; these models predict reward by driving that reward error to zero. The authors construct a TDRL model that can accommodate extinction and renewal through two simple processes: (a) a TDRL process that learns the value of situation–action pairs and (b) a situation recognition process that categorizes the observed cues into situations. This model has implications for dysfunctional states, including relapse after addiction and problem gambling.
A mechanism for error detection in speeded response time tasks
- Journal of Experimental Psychology: General
, 2005
"... The concept of error detection plays a central role in theories of executive control. In this article, the authors present a mechanism that can rapidly detect errors in speeded response time tasks. This error monitor assigns values to the output of cognitive processes involved in stimulus categoriza ..."
Abstract
-
Cited by 37 (11 self)
- Add to MetaCart
The concept of error detection plays a central role in theories of executive control. In this article, the authors present a mechanism that can rapidly detect errors in speeded response time tasks. This error monitor assigns values to the output of cognitive processes involved in stimulus categorization and response generation and detects errors by identifying states of the system associated with negative value. The mechanism is formalized in a computational model based on a recent theoretical framework for understanding error processing in humans (C. B. Holroyd & M. G. H. Coles, 2002). The model is used to simulate behavioral and event-related brain potential data in a speeded response time task, and the results of the simulation are compared with empirical data. Frontal parts of the brain, including the prefrontal cortex (Luria, 1973; Stuss & Knight, 2002), the anterior cingulate cortex (Devinsky, Morrell, & Vogt, 1995; Posner & DiGirolamo, 1998), and their connections with the basal ganglia (L. L. Brown, Schneider, & Lidsky, 1997; Cummings, 1993), are thought to compose an executive system for cognitive control. The functions of this system are thought to include setting high-level goals, directing other
The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World
- Adaptive Behavior
, 2008
"... The online version of this article can be found at: ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
(Show Context)
The online version of this article can be found at:
Pupil diameter predicts changes in the exploration-exploitation tradeoff: evidence for the adaptive gain theory
- J Cogn Neurosci
"... ■ The adaptive regulation of the balance between exploitation and exploration is critical for the optimization of behavioral per-formance. Animal research and computational modeling have suggested that changes in exploitative versus exploratory control state in response to changes in task utility ar ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
■ The adaptive regulation of the balance between exploitation and exploration is critical for the optimization of behavioral per-formance. Animal research and computational modeling have suggested that changes in exploitative versus exploratory control state in response to changes in task utility are mediated by the neuromodulatory locus coeruleus–norepinephrine (LC–NE) sys-tem. Recent studies have suggested that utility-driven changes in control state correlate with pupil diameter, and that pupil diam-eter can be used as an indirect marker of LC activity. Wemeasured participants ʼ pupil diameter while they performed a gambling task with a gradually changing payoff structure. Each choice in this task can be classified as exploitative or exploratory using a com-putational model of reinforcement learning. We examined the relationship between pupil diameter, task utility, and choice strat-egy (exploitation vs. exploration), and found that (i) exploratory choices were preceded by a larger baseline pupil diameter than exploitative choices; (ii) individual differences in baseline pupil diameter were predictive of an individualʼs tendency to explore; and (iii) changes in pupil diameter surrounding the transition between exploitative and exploratory choices correlated with changes in task utility. These findings provide novel evidence that pupil diameter correlates closely with control state, and are con-sistent with a role for the LC–NE system in the regulation of the exploration–exploitation trade-off in humans. ■
In search of the neural circuits of intrinsic motivation
- Frontiers in neuroscience
, 2007
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
(Show Context)
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
A.: The Psikharpax project: Towards building an artificial rat. Robotics and Autonomous Systems 50(4
, 2005
"... Drawing inspiration from biology, the Psikharpax project aims at endowing a robot with a sensori-motor equipment and a neural control architecture that will afford some of the capacities of autonomy and adaptation that are exhibited by real rats. The paper summarizes the current state of achievement ..."
Abstract
-
Cited by 18 (12 self)
- Add to MetaCart
(Show Context)
Drawing inspiration from biology, the Psikharpax project aims at endowing a robot with a sensori-motor equipment and a neural control architecture that will afford some of the capacities of autonomy and adaptation that are exhibited by real rats. The paper summarizes the current state of achievement of the project. It successively describes the robot’s future sensors and actuators, and several biomimetic models of the anatomy and physiology of structures in the rat’s brain, like the hippocampus and the basal ganglia, which have already been at work on various robots, and that make navigation and action selection possible. Preliminary results on the implementation of learning mechanisms in these structures are also presented. Finally, the article discusses the potential benefits that a biologically-inspired approach affords to traditional autonomous robotics.
Principles underlying the construction of brain-based devices
- University of Bristol
, 2006
"... Without a doubt the most sophisticated behaviour seen in biological agents is demonstrated by organisms whose behaviour is guided by a nervous system. Thus, the construction of behaving devices based on principles of nervous systems may have much to offer. Our group has built series of brain-based d ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Without a doubt the most sophisticated behaviour seen in biological agents is demonstrated by organisms whose behaviour is guided by a nervous system. Thus, the construction of behaving devices based on principles of nervous systems may have much to offer. Our group has built series of brain-based devices (BBDs) over the last 14 years to provide a heuristic for studying brain function by embedding neurobiological principles on a physical platform capable of interacting with the real world. These BBDs have been used to study perception, operant conditioning, episodic and spatial memory, and motor control through the simulation of brain regions such as the visual cortex, the dopaminergic reward system, the hippocampus, and the cerebellum. Following the brain-based model, we argue that an intelligent machine should be constrained by the following design principles: (i) it should incorporate a simulated brain with detailed neuroanatomy and neural dynamics that controls behaviour and shapes memory, (ii) it should organize the unlabeled signals it receives from the environment into categories without a priori knowledge or instruction, (iii) it should have a physical instantiation, which allows for active sensing and autonomous movement in the environment, (iv) it should engage in a task that is initially constrained by minimal set of innate behaviours or reflexes, (v) it should have a means to adapt the device’s behaviour, called value systems, when an important environmental event occurs, and (vi) it should allow comparisons with experimental data acquired from animal nervous systems. Like the brain, these devices operate according to selectional principles through which they form categorical memory, associate categories with innate value, and adapt to the environment. Moreover, this approach may provide the groundwork for the development of intelligent machines that follow neurobiological rather than computational principles in their construction. 1
Emotion and reinforcement: Affective facial expressions facilitate robot learning
- Artificial Intelligence for Human Computing, Lecture Notes in Artificial Intelligence, Volume 4451
, 2007
"... Computer models can be used to investigate the role of emotion in learning. Here we present EARL, our framework for the systematic study of the relation between emotion, adaptation and reinforcement learning (RL). EARL enables the study of, among other things, communicated affect as reinforcement to ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
(Show Context)
Computer models can be used to investigate the role of emotion in learning. Here we present EARL, our framework for the systematic study of the relation between emotion, adaptation and reinforcement learning (RL). EARL enables the study of, among other things, communicated affect as reinforcement to the robot; the focus of this paper. In humans, emotions are crucial to learning. For example, a parent—observing a child—uses emotional expression to encourage or discourage specific behaviors. Emotional expression can therefore be a reinforcement signal to a child. We hypothesize that affective facial expressions facilitate robot learning, and compare a social setting with a nonsocial