Results 1  10
of
13
Generalized Model Learning for Reinforcement Learning on a Humanoid Robot
"... Abstract—Reinforcement learning (RL) algorithms have long been promising methods for enabling an autonomous robot to improve its behavior on sequential decisionmaking tasks. The obvious enticement is that the robot should be able to improve its own behavior without the need for detailed stepbyste ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
(Show Context)
Abstract—Reinforcement learning (RL) algorithms have long been promising methods for enabling an autonomous robot to improve its behavior on sequential decisionmaking tasks. The obvious enticement is that the robot should be able to improve its own behavior without the need for detailed stepbystep programming. However, for RL to reach its full potential, the algorithms must be sample efficient: they must learn competent behavior from very few realworld trials. From this perspective, modelbased methods, which use experiential datamoreefficientlythanmodelfreeapproaches,areappealing. But they often require exhaustive exploration to learn an accurate model of the domain. In this paper, we present an algorithm, Reinforcement Learning with Decision Trees (RLDT), that uses decision trees to learn the model by generalizing the relative effect of actions across states. The agent explores the environment until it believes it has a reasonable policy. The combination of the learning approach with the targeted exploration policy enables fast learning of the model. We compare RLDT against standard modelfree and modelbased learning methods, and demonstrate its effectiveness on an Aldebaran Nao humanoid robot scoring goals in a penalty kick scenario. I.
Highlevel Reinforcement Learning in Strategy Games
"... Video games provide a rich testbed for artificial intelligence methods. In particular, creating automated opponents that perform well in strategy games is a difficult task. For instance, human players rapidly discover and exploit the weaknesses of hard coded strategies. To build better strategies, w ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Video games provide a rich testbed for artificial intelligence methods. In particular, creating automated opponents that perform well in strategy games is a difficult task. For instance, human players rapidly discover and exploit the weaknesses of hard coded strategies. To build better strategies, we suggest a reinforcement learning approach for learning a policy that switches between highlevel strategies. These strategies are chosen based on different game situations and a fixed opponent strategy. Our learning agents are able to rapidly adapt to fixed opponents and improve deficiencies in the hard coded strategies, as the results demonstrate.
Real Time Targeted Exploration in Large Domains
"... Abstract—A developing agent needs to explore to learn about the world and learn good behaviors. In many real world tasks, this exploration can take far too long, and the agent must make decisions about which states to explore, and which states not to explore. Bayesian methods attempt to address this ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Abstract—A developing agent needs to explore to learn about the world and learn good behaviors. In many real world tasks, this exploration can take far too long, and the agent must make decisions about which states to explore, and which states not to explore. Bayesian methods attempt to address this problem, but take too much computation time to run in reasonably sized domains. In this paper, we present TEXPLORE, the first algorithm to perform targeted exploration in real time in large domains. The algorithm learns multiple possible models of the domain that generalize action effects across states. We experiment with possible ways of adding intrinsic motivation to the agent to drive exploration. TEXPLORE isfullyimplementedandtestedinanovel domain called Fuel World that is designed to reflect the type of targeted exploration needed in the real world. We show that our algorithm significantly outperforms representative examples of both modelfree and modelbased RL algorithms from the literature and is able to quickly learn to perform well in a large world in realtime. I.
Autonomous Qualitative Learning of Distinctions and Actions in a Developing Agent
"... How can an agent bootstrap up from a pixellevel representation to autonomously learn highlevel states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of con ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
How can an agent bootstrap up from a pixellevel representation to autonomously learn highlevel states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of continuous motor primitives, and poses a solution for the problem of how an agent can learn a set of useful states and effective higherlevel actions through autonomous experience with the environment. There exist methods for learning models of the environment, and there also exist methods for planning. However, for autonomous learning, these methods have been used almost exclusively in discrete environments. This thesis proposes attacking the problem of learning highlevel states and actions in continuous environments by using a qualitative representation to bridge the gap between continuous and discrete variable representations. In this approach, the agent begins with a broad discretization and initially can only tell if the value of each variable is increasing, decreasing, or remaining steady. The agent then simultaneously learns a qualitative representation (discretization) and a set of predictive models of the environment. The agent then converts these models into plans to form actions. The agent then uses those learned actions to explore the environment. The method is evaluated using a simulated robot with realistic physics. The robot is sitting at a table that contains one or two blocks, as well as other distractor objects that are out of reach. The agent autonomously explores the environment without being given a task. After learning, the agent is given various tasks to determine if it learned the necessary states and actions to complete them. The results show that the agent was able to use this method to autonomously learn to perform the tasks.
how hard is my mdp?” the distributionnorm to the rescue
 in Adv. in Neural Inf. Process. Syst
"... In Reinforcement Learning (RL), stateoftheart algorithms require a large number of samples per stateaction pair to estimate the transition kernel p. In many problems, a good approximation of p is not needed. For instance, if from one stateaction pair (s, a), one can only transit to states with ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In Reinforcement Learning (RL), stateoftheart algorithms require a large number of samples per stateaction pair to estimate the transition kernel p. In many problems, a good approximation of p is not needed. For instance, if from one stateaction pair (s, a), one can only transit to states with the same value, learning p(·s, a) accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) based on what we call the distributionnorm. The distributionnorm w.r.t. a measure ν is defined on zero νmean functions f by the standard variation of f with respect to ν. We first provide a concentration inequality for the dual of the distributionnorm. This allows us to replace the problemfree, loose   · 1 concentration inequalities used in most previous analysis of RL algorithms, with a tighter problemdependent hardness measure. We then show that several common RL benchmarks have low hardness when measured using the new norm. The distributionnorm captures finer properties than the number of states or the diameter and can be used to assess the difficulty of MDPs. 1
Interval Estimation for ReinforcementLearning Algorithms in ContinuousState Domains
"... The reinforcement learning community has explored many approaches to obtaining value estimates and models to guide decision making; these approaches, however, do not usually provide a measure of confidence in the estimate. Accurate estimates of an agent’s confidence are useful for many applications, ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The reinforcement learning community has explored many approaches to obtaining value estimates and models to guide decision making; these approaches, however, do not usually provide a measure of confidence in the estimate. Accurate estimates of an agent’s confidence are useful for many applications, such as biasing exploration and automatically adjusting parameters to reduce dependence on parametertuning. Computing confidence intervals on reinforcement learning value estimates, however, is challenging because data generated by the agentenvironment interaction rarely satisfies traditional assumptions. Samples of valueestimates are dependent, likely nonnormally distributed and often limited, particularly in early learning when confidence estimates are pivotal. In this work, we investigate how to compute robust confidences for value estimates in continuous Markov decision processes. We illustrate how to use bootstrapping to compute confidence intervals online under a changing policy (previously not possible) and prove validity under a few reasonable assumptions. We demonstrate the applicability of our confidence estimation algorithms with experiments on exploration, parameter estimation and tracking. 1
TEXPLORE: Temporal Difference Reinforcement Learning for Robots and TimeConstrained Domains
, 2012
"... ..."
European Workshop on Reinforcement Learning Directed Exploration in Reinforcement Learning with Transferred Knowledge
, 2012
"... Abstract Experimental results suggest that transferred knowledge can reduce the number of exploratory actions needed by reinforcement learning (RL) algorithms to find acceptable solutions in Markov decision processes compared to learning from scratch. However, most existing transfer learning algori ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Experimental results suggest that transferred knowledge can reduce the number of exploratory actions needed by reinforcement learning (RL) algorithms to find acceptable solutions in Markov decision processes compared to learning from scratch. However, most existing transfer learning algorithms for RL are heuristic and transferred knowledge may unexpectedly result in worse performance than learning from scratch (i.e., negative transfer). We introduce a transfer learning algorithm that employs directed exploration, which allows us to motivate our algorithm by analyzing its sample complexity of exploration in the target task. We define positive and negative transfer from a sample complexity perspective and provide conditions when our algorithm will avoid negative transfer as well as conditions where our algorithm guarantees positive transfer, with high probability. Finally, we demonstrate the advantages of our algorithm experimentally.