Results 1  10
of
15
Forward models: Supervised learning with a distal teacher
 Cognitive Science
, 1992
"... Internal models of the environment have an important role to play in adaptive systems in general and are of particular importance for the supervised learning paradigm. In this paper we demonstrate that certain classical problems associated with the notion of the \teacher " in supervised lea ..."
Abstract

Cited by 421 (9 self)
 Add to MetaCart
Internal models of the environment have an important role to play in adaptive systems in general and are of particular importance for the supervised learning paradigm. In this paper we demonstrate that certain classical problems associated with the notion of the \teacher &quot; in supervised learning can be solved by judicious use of learned internal models as components of the adaptive system. In particular, we show how supervised learning algorithms can be utilized in cases in which an unknown dynamical system intervenes between actions and desired outcomes. Our approach applies to any supervised learning algorithm that is capable of learning in multilayer networks.
A Possibility for Implementing Curiosity and Boredom in ModelBuilding Neural Controllers
, 1991
"... This paper introduces a framework for `curious neural controllers' which employ an adaptive world model for goal directed online learning. First an online reinforcement learning algorithm for autonomous `animats' is described. The algorithm is based on two fully recurrent `selfsupervise ..."
Abstract

Cited by 118 (25 self)
 Add to MetaCart
This paper introduces a framework for `curious neural controllers' which employ an adaptive world model for goal directed online learning. First an online reinforcement learning algorithm for autonomous `animats' is described. The algorithm is based on two fully recurrent `selfsupervised' continually running networks which learn in parallel. One of the networks learns to represent a complete model of the environmental dynamics and is called the `model network'. It provides complete `credit assignment paths' into the past for the second network which controls the animats physical actions in a possibly reactive environment. The animats goal is to maximize cumulative reinforcement and minimize cumulative `pain'. The algorithm has properties which allow to implement something like the desire to improve the model network's knowledge about the world. This is related to curiosity. It is described how the particular algorithm (as well as similar modelbuilding algorithms) may be augmented ...
Formal Theory of Creativity, Fun, and Intrinsic Motivation (19902010)
"... The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditio ..."
Abstract

Cited by 75 (16 self)
 Add to MetaCart
(Show Context)
The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional field of active learning, and is related to old but less formal ideas in aesthetics theory and developmental psychology. It has been argued that the theory explains many essential aspects of intelligence including autonomous development, science, art, music, humor. This overview first describes theoretically optimal (but not necessarily practical) ways of implementing the basic computational principles on exploratory, intrinsically motivated agents or robots, encouraging them to provoke event sequences exhibiting previously unknown but learnable algorithmic regularities. Emphasis is put on the importance of limited computational resources for online prediction and compression. Discrete and continuous time formulations are given. Previous practical but nonoptimal implementations (1991, 1995, 19972002) are reviewed, as well as several recent variants by others (2005). A simplified typology addresses current confusion concerning the precise nature of intrinsic motivation.
Learning To Generate Artificial Fovea Trajectories For Target Detection
, 1991
"... It is shown how `static' neural approaches to adaptive target detection can be replaced by a more efficient and more sequential alternative. The latter is inspired by the observation that biological systems employ sequential eyemovements for pattern recognition. A system is described which ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
It is shown how `static' neural approaches to adaptive target detection can be replaced by a more efficient and more sequential alternative. The latter is inspired by the observation that biological systems employ sequential eyemovements for pattern recognition. A system is described which builds an adaptive model of the timevarying inputs of an artificial fovea controlled by an adaptive neural controller. The controller uses the adaptive model for learning the sequential generation of fovea trajectories causing the fovea to move to a target in a visual scene. The system also learns to track moving targets. No teacher provides the desired activations of `eyemuscles' at various times. The only goal information is the shape of the target. Since the task is a `rewardonlyatgoal' task , it involves a complex temporal credit assignment problem. Some implications for adaptive attentive systems in general are discussed. 1 INTRODUCTION We study an aspect of adaptive vision with...
Artificial curiosity with planning for autonomous perceptual and cognitive development
 Proceedings of the First Joint Conference on Development Learning and on Epigenetic Robotics ICDLEPIROB
, 2011
"... Abstract—Autonomous agents that learn from reward on highdimensional visual observations must learn to simplify the raw observations in both space (i.e., dimensionality reduction) and time (i.e., prediction), so that reinforcement learning becomes tractable and effective. Training the spatial and te ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
Abstract—Autonomous agents that learn from reward on highdimensional visual observations must learn to simplify the raw observations in both space (i.e., dimensionality reduction) and time (i.e., prediction), so that reinforcement learning becomes tractable and effective. Training the spatial and temporal models requires an appropriate sampling scheme, which cannot be hardcoded if the algorithm is to be general. Intrinsic rewards are associated with samples that best improve the agent’s model of the world. Yet the dynamic nature of an intrinsic reward signal presents a major obstacle to successfully realizing an efficient curiositydrive. TDbased incremental reinforcement learning approaches fail to adapt quickly enough to effectively exploit the curiosity signal. In this paper, a novel artificial curiosity system with planning is implemented, based on developmental or continual learning principles. Leastsquares policy iteration is used with an agent’s internal forward model, to efficiently assign values for maximizing combined external and intrinsic reward. The properties of this system are illustrated in a highdimensional, noisy, visual environment that requires the agent to explore. With no useful external value information early on, the selfgenerated intrinsic values lead to actions that improve both its spatial (perceptual) and temporal (cognitive) models. Curiosity also leads it to learn how it could act to maximize external reward. I.
Learning Algorithms for Networks with Internal and External Feedback
 IN D. S. TOURETZKY , J. L. ELMAN , T. J. SEJNOWSKI , G. E. HINTON , PROC OF THE CONNECTIONIST MODELS SUMMER SCHOOL, PAGES 5261. SAN MATEO, CA: MORGAN KAUFMANN, 1990.
, 1990
"... This paper gives an overview of some novel algorithms for reinforcement learning in nonstationary possibly reactive environments. I have decided to describe many ideas briefly rather than going into great detail on any one idea. The paper is structured as follows: In the first section some terminolo ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
This paper gives an overview of some novel algorithms for reinforcement learning in nonstationary possibly reactive environments. I have decided to describe many ideas briefly rather than going into great detail on any one idea. The paper is structured as follows: In the first section some terminology is introduced. Then there follow five sections, each headed by a short abstract. The second section describes the entirely local `neural bucket brigade algorithm'. The third section applies Sutton's TDmethods to fully recurrent continually running probabilistic networks. The fourth section describes an algorithm based on system identification and on two interacting fully recurrent `selfsupervised' learning networks. The fifth section describes an application of adaptive control techniques to adaptive attentive vision: It demonstrates how `selective attention' can be learned. Finally, the sixth section critisizes methods based on system identification and adaptive critics, and describes ...
New millennium AI and the convergence of history
 Challenges to Computational Intelligence
, 2007
"... Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At the same time there has been rapid progress in practical methods for learning true sequenceprocessing programs, as opposed to traditional methods limited to stationary pattern association. Here we will briefly review some of the new results, and speculate about future developments, pointing out that the time intervals between the most notable events in over 40,000 years or 2 9 lifetimes of human history have sped up exponentially, apparently converging to zero within the next few decades. Or is this impression just a byproduct of the way humans allocate memory space to past events? 1
Gradientbased Reinforcement Planning in PolicySearch Methods
 Onderwijsinsituut CKI, Utrecht Univ
, 2001
"... We introduce a learning method called "gradientbased reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradientbased method that plans ahead and improves its policy before it actually acts in the environment. We derive fo ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We introduce a learning method called "gradientbased reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradientbased method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.
Making the World Differentiable: On Using SelfSupervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in NonStationary Environments
, 1990
"... First a brief introduction to reinforcement learning and to supervised learning with recurrent networks in nonstationary environments is given. The introduction also covers the basic principle of `gradient descent through frozen model networks' as employed by Werbos, Jordan, Munro, Robinson ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
First a brief introduction to reinforcement learning and to supervised learning with recurrent networks in nonstationary environments is given. The introduction also covers the basic principle of `gradient descent through frozen model networks' as employed by Werbos, Jordan, Munro, Robinson and Fallside, and Nguyen and Widrow. This principle allows supervised learning techniques to be employed for reinforcement learning. Then a general algorithm for a reinforcement learning neural network with internal and external feedback in a nonstationary reactive environment is described. Internal feedback is given by connections that allow cyclic activation flow through the network. External feedback is given by output actions that may change the state of the environment thus influencing subsequent input activations. The network's main goal is to receive as much reinforcement (or as little `pain') as possible. In theory, arbitrary time lags between actions and ulterior consequences ar...
DataEfficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models
"... Abstract Dataefficient reinforcement learning (RL) in continuous stateaction spaces using very highdimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixelstotorques problem, where an RL ..."
Abstract
 Add to MetaCart
Abstract Dataefficient reinforcement learning (RL) in continuous stateaction spaces using very highdimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixelstotorques problem, where an RL agent learns a closedloop control policy ("torques") from pixel information only. We introduce a dataefficient, modelbased reinforcement learning algorithm that learns such a closedloop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a lowdimensional feature embedding of images jointly with a predictive model in this lowdimensional feature space. Joint learning is crucial for longterm predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closedloop control. Compared to stateoftheart RL methods for continuous states and actions, our approach learns quickly, scales to highdimensional state spaces, is lightweight and an important step toward fully autonomous endtoend learning from pixels to torques.