Results 11 - 20
of
36
Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
"... We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous domain and that doing so results in performance gains. 1
Skill characterization based on betweenness
- In Advances in Neural Information Processing Systems 22
, 2009
"... We present a characterization of a useful class of skills based on a graphical representation of an agent’s interaction with its environment. Our characterization uses betweenness, a measure of centrality on graphs. It captures and generalizes (at least intuitively) the bottleneck concept, which has ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We present a characterization of a useful class of skills based on a graphical representation of an agent’s interaction with its environment. Our characterization uses betweenness, a measure of centrality on graphs. It captures and generalizes (at least intuitively) the bottleneck concept, which has inspired many of the existing skill-discovery algorithms. Our characterization may be used directly to form a set of skills suitable for a given task. More importantly, it serves as a useful guide for developing incremental skill-discovery algorithms that do not rely on knowing or representing the interaction graph in its entirety. 1
Reinforcement learning in the brain
"... Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computation ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and
Transfer of Policies Based on Trajectory Libraries
"... Abstract — Libraries of trajectories are a promising way of creating policies for difficult problems. However, often it is not desirable or even possible to create a new library for every task. We present a method for transferring libraries across tasks, which allows us to build libraries by learnin ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract — Libraries of trajectories are a promising way of creating policies for difficult problems. However, often it is not desirable or even possible to create a new library for every task. We present a method for transferring libraries across tasks, which allows us to build libraries by learning from demonstration on one task and apply them to similar tasks. Representing the libraries in a feature-based space is key to supporting transfer. We also search through the library to ensure a complete path to the goal is possible. Results are shown for the Little Dog task. Little Dog is a quadruped robot
The development of hierarchical knowledge in robot systems
, 2009
"... This dissertation would not have been possible without the help and support of many people. Most of all, I would like to extend my gratitude to Rod Grupen for many years of inspiring work, our discussions, and his guidance. Without his support and vision, I cannot imagine that the journey would have ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This dissertation would not have been possible without the help and support of many people. Most of all, I would like to extend my gratitude to Rod Grupen for many years of inspiring work, our discussions, and his guidance. Without his support and vision, I cannot imagine that the journey would have been as enormously enjoyable and rewarding as it turned out to be. I am very excited about what we discovered during my time at UMass, but there is much more to be done. I look forward to what comes next! In addition to providing professional inspiration, Rod was a great person to work with and for—creating a warm and encouraging laboratory atmosphere, motivating us to stay in shape for his annual half-marathons, and ensuring a sufficient amount of cake at the weekly lab meetings. Thanks for all your support, Rod! I am very grateful to my thesis committee—Andy Barto, David Jensen, and Rachel Keen—for many encouraging and inspirational discussions. Their comments and feedback significantly contributed to the form of this document. I would especially
The utility of temporal abstraction in reinforcement learning
- Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems
, 2008
"... The hierarchical structure of real-world problems has motivated extensive research into temporal abstractions for reinforcement learning, but precisely how these abstractions allow agents to improve their learning performance is not well understood. This paper investigates the connection between tem ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The hierarchical structure of real-world problems has motivated extensive research into temporal abstractions for reinforcement learning, but precisely how these abstractions allow agents to improve their learning performance is not well understood. This paper investigates the connection between temporal abstraction and an agent’s exploration policy, which determines how the agent’s performance improves over time. Experimental results with standard methods for incorporating temporal abstractions show that these methods benefit learning only in limited contexts. The primary contribution of this paper is a clearer understanding of how hierarchical decompositions interact with reinforcement learning algorithms, with important consequences for the manual design or automatic discovery of action hierarchies.
Self-Organizing Distinctive State Abstraction Using Options ∗
"... An important problem in developmental robotics is the automatic learning of motor routines or behaviors without human guidance. This paper presents Self-Organizing Distinctive-State Abstraction (SODA), a method by which a robot can learn a set of sensory features and reusable motor routines from raw ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
An important problem in developmental robotics is the automatic learning of motor routines or behaviors without human guidance. This paper presents Self-Organizing Distinctive-State Abstraction (SODA), a method by which a robot can learn a set of sensory features and reusable motor routines from raw sensorimotor experience. Experiments show that the learned versions of the motor routines outperform hard-coded alternatives, and that robots using the learned routines can learn tasks much more quickly than when using primitive actions. The features and motor routines are learned autonomously, and reflect only the environment and the agent’s sensorimotor capabilities, without external direction. 1.
Hierarchical Model-Based Reinforcement Learning: R-MAX + MAXQ
"... Hierarchical decomposition promises to help scale reinforcement learning algorithms naturally to real-world problems by exploiting their underlying structure. Model-based algorithms, which provided the first finite-time convergence guarantees for reinforcement learning, may also play an important ro ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Hierarchical decomposition promises to help scale reinforcement learning algorithms naturally to real-world problems by exploiting their underlying structure. Model-based algorithms, which provided the first finite-time convergence guarantees for reinforcement learning, may also play an important role in coping with the relative scarcity of data in large environments. In this paper, we introduce an algorithm that fully integrates modern hierarchical and model-learning methods in the standard reinforcement learning setting. Our algorithm, R-MAXQ, inherits the efficient modelbased exploration of the R-MAX algorithm and the opportunities for abstraction provided by the MAXQ framework. We analyze the sample complexity of our algorithm, and our experiments in a standard simulation environment illustrate the advantages of combining hierarchies and models. 1.
Automatic Induction of MAXQ Hierarchies
"... Scaling up reinforcement learning to large domains requires leveraging the structure in the domain. Hierarchical reinforcement learning has been one of the ways in which the domain structure is exploited to constrain the value function space of the learner, and speed up learning[10, 3, 1]. In the MA ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Scaling up reinforcement learning to large domains requires leveraging the structure in the domain. Hierarchical reinforcement learning has been one of the ways in which the domain structure is exploited to constrain the value function space of the learner, and speed up learning[10, 3, 1]. In the MAXQ framework, for example, a task hierarchy is defined, and a set of relevant features to represent the completion function for each task-subtask pair are given [3], resulting in decomposed subtask-specific value functions that are easier to learn than the global value function. The MAXQ decomposition facilitates learning separate value functions for subtasks. The task hierarchy is represented as a directed acyclic graph. The leaf nodes are the primitive subtasks. Each composite subtask defines a semi-Markov Decision Process (SMDP) with a set of actions (which may include primitive actions or other subtasks), a set of state variables, a termination predicate which defines a set of exit states for the subtask, and a pseudo-reward function defined over the exits. Several researchers have focused on the problem of automatically inducing temporally extended actions and task-subtask hierarchies [4, 7, 8, 9, 2, 11, 6, 5]. Discovering tasksubtask

