Results 11 - 20
of
229
Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion
"... We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling th ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
(Show Context)
We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling the system, which makes this approach infeasible. For example, consider the task of teaching a quadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly non-trivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeship learning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories. This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion over extreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work. 1
Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization
- Proceedings of the 8-th Conference on Intelligent Autonomous Systems, IAS-8
, 2004
"... We introduce a new method for hierarchical reinforcement learning. Highlevel policies automatically discover subgoals; low-level policies learn to specialize on different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. High-level value functions c ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
(Show Context)
We introduce a new method for hierarchical reinforcement learning. Highlevel policies automatically discover subgoals; low-level policies learn to specialize on different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. High-level value functions cover the state space at a coarse level; low-level value functions cover only parts of the state space at a fine-grained level. Experiments show that this method outperforms several flat reinforcement learning methods in a deterministic task and in a stochastic task.
Reinforcement Learning in Robotics: A Survey
"... Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between di ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between modelbased and model-free as well as between value function-based and policy search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and
Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
"... We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
(Show Context)
We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous domain and that doing so results in performance gains. 1
Reinforcement Learning to adjust Robot Movements to New Situations
"... Abstract—Many complex robot motor skills can be represented using elementary movements, and there exist efficient techniques for learning parametrized motor plans using demonstrations and self-improvement. However, in many cases, the robot currently needs to learn a new elementary movement even if a ..."
Abstract
-
Cited by 35 (11 self)
- Add to MetaCart
(Show Context)
Abstract—Many complex robot motor skills can be represented using elementary movements, and there exist efficient techniques for learning parametrized motor plans using demonstrations and self-improvement. However, in many cases, the robot currently needs to learn a new elementary movement even if a parametrized motor plan exists that covers a similar, related situation. Clearly, a method is needed that modulates the elementary movement through the meta-parameters of its representation. In this paper, we show how to learn such mappings from circumstances to meta-parameters using reinforcement learning. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. We compare this algorithm to several previous methods on a toy example and show that it performs well in comparison to standard algorithms. Subsequently, we show two robot applications of the presented setup; i.e., the generalization of throwing movements in darts, and of hitting movements in table tennis. We show that both tasks can be learned successfully using simulated and real robots. I.
Intrinsically motivated reinforcement learning: A promising framework for developmental robot learning
- In The AAAI Spring Symposium on Developmental Robotics
, 2005
"... One of the primary challenges of developmental robotics is the question of how to learn and represent increasingly complex behavior in a self-motivated, open-ended way. Barto, Singh, and Chentanez (Barto, Singh, & Chentanez 2004; Singh, Barto, & Chentanez 2004) have recently presented an a ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
(Show Context)
One of the primary challenges of developmental robotics is the question of how to learn and represent increasingly complex behavior in a self-motivated, open-ended way. Barto, Singh, and Chentanez (Barto, Singh, & Chentanez 2004; Singh, Barto, & Chentanez 2004) have recently presented an algorithm for intrinsically motivated reinforcement learning that strives to achieve broad competence in an environment in a tasknonspecific manner by incorporating internal reward to build a hierarchical collection of skills. This paper suggests that with its emphasis on task-general, self-motivated, and hierarchical learning, intrinsically motivated reinforcement learning is an obvious choice for organizing behavior in developmental robotics. We present additional preliminary results from a gridworld abstraction of a robot environment and advocate a layered learning architecture for applying the algorithm on a physically embodied system.
A computational model of the cerebral cortex
- In Proceedings of AAAI-05, 938–943
, 2005
"... Our current understanding of the primate cerebral cortex (neocortex) and in particular the posterior, sensory association cortex has matured to a point where it is possible to develop a family of graphical models that capture the structure, scale and power of the neocortex for purposes of associativ ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
Our current understanding of the primate cerebral cortex (neocortex) and in particular the posterior, sensory association cortex has matured to a point where it is possible to develop a family of graphical models that capture the structure, scale and power of the neocortex for purposes of associative recall, sequence prediction and pattern completion among other functions. Implementing such models using readily available computing clusters is now within the grasp of many labs and would provide scientists with the opportunity to experiment with both hard-wired connection schemes and structure-learning algorithms inspired by animal learning and developmental studies. While neural circuits involving structures external to the neocortex such as the thalamic nuclei are less well understood, the availability of a computational model on which to test hypotheses would likely accelerate our understanding of these circuits. Furthermore, the existence of an agreedupon cortical substrate would not only facilitate our understanding of the brain but enable researchers to combine lessons learned from biology with state-of-theart graphical-model and machine-learning techniques to design hybrid systems that combine the best of biological and traditional computing approaches.
Constructing skill trees for reinforcement learning agents from demonstration trajectories
- In Advances in Neural Information Processing Systems (NIPS
, 2010
"... We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too c ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
(Show Context)
We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction. 1
Hierarchical relative entropy policy search
- In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS
, 2012
"... Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an agent’s policy may well be the key to improved scalability and higher per-formance. However, such hierarchical struc-tures cannot be exploited by current policy search algorithms. We will concentrate ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
(Show Context)
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an agent’s policy may well be the key to improved scalability and higher per-formance. However, such hierarchical struc-tures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy — the ‘mixed option ’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy deter-mines the action. In this paper, we reformulate learning a hi-erarchical policy as a latent variable estima-tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solu-tions while also showing an increased perfor-mance in terms of learning speed and quality of the found policy in comparison to the non-hierarchical approach. 1
The decentralised coordination of self-adaptive components for autonomic distributed systems
"... I, the undersigned, declare that this work has not previously been submitted to this or any other University, and that unless otherwise stated, it is entirely my own work. ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
(Show Context)
I, the undersigned, declare that this work has not previously been submitted to this or any other University, and that unless otherwise stated, it is entirely my own work.