• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Recent advances in hierarchical reinforcement learning. Discrete-Event Systems Journal 13:41–77 (2003)

by A Barto, S Mahadevan
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 229
Next 10 →

Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion

by J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng
"... We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling th ..."
Abstract - Cited by 43 (3 self) - Add to MetaCart
We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling the system, which makes this approach infeasible. For example, consider the task of teaching a quadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly non-trivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeship learning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories. This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion over extreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work. 1
(Show Context)

Citation Context

...y given somecandidate reward function, and even this is challenging in large domains. Indeed, such domains often necessitate hierarchical control in order to reduce the complexity of the control task =-=[2, 4, 13, 10]-=-. As a motivating application, consider the task of navigating a quadruped robot (shown in Figure1(a)) over challenging, irregular terrain (shown in Figure 1(b,c)). In a naive approach, the dimensiona...

Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization

by Bram Bakker, Jürgen Schmidhuber - Proceedings of the 8-th Conference on Intelligent Autonomous Systems, IAS-8 , 2004
"... We introduce a new method for hierarchical reinforcement learning. Highlevel policies automatically discover subgoals; low-level policies learn to specialize on different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. High-level value functions c ..."
Abstract - Cited by 41 (4 self) - Add to MetaCart
We introduce a new method for hierarchical reinforcement learning. Highlevel policies automatically discover subgoals; low-level policies learn to specialize on different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. High-level value functions cover the state space at a coarse level; low-level value functions cover only parts of the state space at a fine-grained level. Experiments show that this method outperforms several flat reinforcement learning methods in a deterministic task and in a stochastic task.
(Show Context)

Citation Context

...asily, either within the same task or in other tasks. Most studies on HRL assume that the hierarchical structure itself is given by a designer, and they learn policies within this hardwired structure =-=[3, 11, 4, 2]. Si-=-nce an important goal of machine learning is to minimize the designer’s role, it is desirable to learn the hierarchy itself. As Barto and Mahadevan [2] put it, “a key open question is how to form ...

Reinforcement Learning in Robotics: A Survey

by Jens Kober, J. Andrew Bagnell , Jan Peters
"... Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between di ..."
Abstract - Cited by 39 (2 self) - Add to MetaCart
Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between modelbased and model-free as well as between value function-based and policy search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and

Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

by George Konidaris, Andrew G. Barto
"... We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous ..."
Abstract - Cited by 39 (8 self) - Add to MetaCart
We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous domain and that doing so results in performance gains. 1
(Show Context)

Citation Context

... challenging continuous domain and that doing so results in performance gains. 1 Introduction Much recent research in reinforcement learning has focused on hierarchical reinforcement learning methods =-=[1]-=- and in particular the options framework [2], which adds principled methods for planning and learning using high level skills (called options) to the standard reinforcement learning framework. An impo...

Reinforcement Learning to adjust Robot Movements to New Situations

by Jens Kober, Erhan Oztop
"... Abstract—Many complex robot motor skills can be represented using elementary movements, and there exist efficient techniques for learning parametrized motor plans using demonstrations and self-improvement. However, in many cases, the robot currently needs to learn a new elementary movement even if a ..."
Abstract - Cited by 35 (11 self) - Add to MetaCart
Abstract—Many complex robot motor skills can be represented using elementary movements, and there exist efficient techniques for learning parametrized motor plans using demonstrations and self-improvement. However, in many cases, the robot currently needs to learn a new elementary movement even if a parametrized motor plan exists that covers a similar, related situation. Clearly, a method is needed that modulates the elementary movement through the meta-parameters of its representation. In this paper, we show how to learn such mappings from circumstances to meta-parameters using reinforcement learning. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. We compare this algorithm to several previous methods on a toy example and show that it performs well in comparison to standard algorithms. Subsequently, we show two robot applications of the presented setup; i.e., the generalization of throwing movements in darts, and of hitting movements in table tennis. We show that both tasks can be learned successfully using simulated and real robots. I.
(Show Context)

Citation Context

...-parameter function in this context but cannot show timing and velocity and it requires a careful observer to note the important configuration differences resulting from the meta-parameters. approach =-=[24]-=- (as introduced in the early work by [25]). In this framework, the motor primitives with meta-parameter functions could be seen as robotics counterpart of options [9] or macro-actions [26]. REFERENCES...

Intrinsically motivated reinforcement learning: A promising framework for developmental robot learning

by Andrew Stout, George D. Konidaris, Andrew G. Barto - In The AAAI Spring Symposium on Developmental Robotics , 2005
"... One of the primary challenges of developmental robotics is the question of how to learn and represent increasingly complex behavior in a self-motivated, open-ended way. Barto, Singh, and Chentanez (Barto, Singh, & Chentanez 2004; Singh, Barto, & Chentanez 2004) have recently presented an a ..."
Abstract - Cited by 32 (1 self) - Add to MetaCart
One of the primary challenges of developmental robotics is the question of how to learn and represent increasingly complex behavior in a self-motivated, open-ended way. Barto, Singh, and Chentanez (Barto, Singh, & Chentanez 2004; Singh, Barto, & Chentanez 2004) have recently presented an algorithm for intrinsically motivated reinforcement learning that strives to achieve broad competence in an environment in a tasknonspecific manner by incorporating internal reward to build a hierarchical collection of skills. This paper suggests that with its emphasis on task-general, self-motivated, and hierarchical learning, intrinsically motivated reinforcement learning is an obvious choice for organizing behavior in developmental robotics. We present additional preliminary results from a gridworld abstraction of a robot environment and advocate a layered learning architecture for applying the algorithm on a physically embodied system.
(Show Context)

Citation Context

...another option, creating an elegant mechanism for behavioral hierarchy. The options framework has a solid theoretical foundation, extending Markov decision processes to semiMarkov decision processes (=-=Barto & Mahadevan 2003-=-), and two components of the options framework are particularly important to the algorithm presented below: Option Models are probabilistic descriptions of the effects of executing an option. They can...

A computational model of the cerebral cortex

by Thomas Dean - In Proceedings of AAAI-05, 938–943 , 2005
"... Our current understanding of the primate cerebral cortex (neocortex) and in particular the posterior, sensory association cortex has matured to a point where it is possible to develop a family of graphical models that capture the structure, scale and power of the neocortex for purposes of associativ ..."
Abstract - Cited by 30 (4 self) - Add to MetaCart
Our current understanding of the primate cerebral cortex (neocortex) and in particular the posterior, sensory association cortex has matured to a point where it is possible to develop a family of graphical models that capture the structure, scale and power of the neocortex for purposes of associative recall, sequence prediction and pattern completion among other functions. Implementing such models using readily available computing clusters is now within the grasp of many labs and would provide scientists with the opportunity to experiment with both hard-wired connection schemes and structure-learning algorithms inspired by animal learning and developmental studies. While neural circuits involving structures external to the neocortex such as the thalamic nuclei are less well understood, the availability of a computational model on which to test hypotheses would likely accelerate our understanding of these circuits. Furthermore, the existence of an agreedupon cortical substrate would not only facilitate our understanding of the brain but enable researchers to combine lessons learned from biology with state-of-theart graphical-model and machine-learning techniques to design hybrid systems that combine the best of biological and traditional computing approaches.

Constructing skill trees for reinforcement learning agents from demonstration trajectories

by George Konidaris, Scott Kuindersma, Andrew Barto, Roderic Grupen - In Advances in Neural Information Processing Systems (NIPS , 2010
"... We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too c ..."
Abstract - Cited by 30 (7 self) - Add to MetaCart
We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction. 1
(Show Context)

Citation Context

...be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction. 1 Introduction Hierarchical reinforcement learning =-=[1]-=- offers an appealing family of approaches to scaling up standard reinforcement learning (RL) [2] methods by enabling the use of both low-level primitive actions and higher-level macro-actions (or skil...

Hierarchical relative entropy policy search

by Christian Daniel, Gerhard Neumann, Jan Peters - In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS , 2012
"... Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an agent’s policy may well be the key to improved scalability and higher per-formance. However, such hierarchical struc-tures cannot be exploited by current policy search algorithms. We will concentrate ..."
Abstract - Cited by 25 (9 self) - Add to MetaCart
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an agent’s policy may well be the key to improved scalability and higher per-formance. However, such hierarchical struc-tures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy — the ‘mixed option ’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy deter-mines the action. In this paper, we reformulate learning a hi-erarchical policy as a latent variable estima-tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solu-tions while also showing an increased perfor-mance in terms of learning speed and quality of the found policy in comparison to the non-hierarchical approach. 1
(Show Context)

Citation Context

...cannot exploit the hierarchical structure inherent to many real-world problems. Introducing such structures has the potential to improve scalability as well as performance of policy search algorithms =-=[15]-=-. For example, many problems require learning a ‘mixed option’ policy. That is, given a set of parametrized options, also called motion templates [16], a gating network first determines the option to ...

The decentralised coordination of self-adaptive components for autonomic distributed systems

by Jim Dowling, Jim Dowling, Jim Dowling, Tim Walsh, Donal La Erty, Andronikos Nedos, Johan Andersson, Mads Haahr, Kulpreet Singh, Vinny Reynolds, Elisa Baniassad, Simon Dobson, Stephen Farrell, Siobhan Clarke, Christian Jensen, Rita Mcguinness, Peter Barron, Greg Biegel, Rene Meier, Dominik Dahlem, Ivana Dusparic, David Mckitterick, Andy Edmonds, Liz Gray, John Keeney, Jim Dowling
"... I, the undersigned, declare that this work has not previously been submitted to this or any other University, and that unless otherwise stated, it is entirely my own work. ..."
Abstract - Cited by 23 (3 self) - Add to MetaCart
I, the undersigned, declare that this work has not previously been submitted to this or any other University, and that unless otherwise stated, it is entirely my own work.
(Show Context)

Citation Context

...apter. The terminology used to describe CRL algorithm is consistent with the terminology from the RL literature (Sutton and Barto, 1998; Kaelbling et al., 1996; Littman and Boyan, 1993; Sutton, 1988; =-=Barto and Mahadevan, 2003-=-). In particular, the term agent is used to describe autonomous decision making entities in a system that executes and learn a 1 st party decision policy. 4.1 Decentralised Coordination and Autonomic ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University