Results 1  10
of
28
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 92 (10 self)
 Add to MetaCart
(Show Context)
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Probabilistic Policy Reuse in a Reinforcement Learning Agent
 IN AAMAS ’06: PROCEEDINGS OF THE FIFTH INTERNATIONAL JOINT CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS
, 2006
"... We contribute Policy Reuse as a technique to improve a reinforcement learning agent with guidance from past learned similar policies. Our method relies on using the past policies as a probabilistic bias where the learning agent faces three choices: the exploitation of the ongoing learned policy, the ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
(Show Context)
We contribute Policy Reuse as a technique to improve a reinforcement learning agent with guidance from past learned similar policies. Our method relies on using the past policies as a probabilistic bias where the learning agent faces three choices: the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. We introduce the algorithm and its major components: an exploration strategy to include the new reuse bias, and a similarity function to estimate the similarity of past policies with respect to a new one. We provide empirical results demonstrating that Policy Reuse improves the learning performance over different strategies that learn without reuse. Interestingly and almost as a side effect, Policy Reuse also identifies classes of similar policies revealing a basis of core policies of the domain. We demonstrate that such a basis can be built incrementally, contributing the learning of the structure of a domain.
Using advice to transfer knowledge acquired in one reinforcement learning task to another
 In Proceedings of the Sixteenth European Conference on Machine Learning
, 2005
"... Abstract. We present a method for transferring knowledge learned in one task to a related task. Our problem solvers employ reinforcement learning to acquire a model for one task. We then transform that learned model into advice for a new task. A human teacher provides a mapping from the old task to ..."
Abstract

Cited by 51 (11 self)
 Add to MetaCart
(Show Context)
Abstract. We present a method for transferring knowledge learned in one task to a related task. Our problem solvers employ reinforcement learning to acquire a model for one task. We then transform that learned model into advice for a new task. A human teacher provides a mapping from the old task to the new task to guide this knowledge transfer. Advice is incorporated into our problem solver using a knowledgebased support vector regression method that we previously developed. This advicetaking approach allows the problem solver to refine or even discard the transferred knowledge based on its subsequent experiences. We empirically demonstrate the effectiveness of our approach with two games from the RoboCup soccer simulator: KeepAway and BreakAway. Our results demonstrate that a problem solver learning to play BreakAway using advice extracted from KeepAway outperforms a problem solver learning without the benefit of such advice. 1
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
Efficient reinforcement learning with relocatable action models
 In Proceedings of the TwentySecond Conference on Artificial Intelligence (AAAI07
, 2007
"... Realistic domains for learning possess regularities that make it possible to generalize experience across related states. This paper explores an environmentmodeling framework that represents transitions as stateindependent outcomes that are common to all states that share the same type. We analyz ..."
Abstract

Cited by 43 (18 self)
 Add to MetaCart
Realistic domains for learning possess regularities that make it possible to generalize experience across related states. This paper explores an environmentmodeling framework that represents transitions as stateindependent outcomes that are common to all states that share the same type. We analyze a set of novel learning problems that arise in this framework, providing lower and upper bounds. We single out one particular variant of practical interest and provide an efficient algorithm and experimental results in both simulated and robotic environments.
Efficient learning of relational models for sequential decision making
, 2010
"... The explorationexploitation tradeoff is crucial to reinforcementlearning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, nearoptimal behavior in all but a polynomial number of ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
The explorationexploitation tradeoff is crucial to reinforcementlearning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, nearoptimal behavior in all but a polynomial number of timesteps in the agent’s lifetime. In this work, we prove similar results for certain relational representations, primarily a class we call “relational action schemas”. These generalized models allow us to specify state transitions in a compact form, for instance describing the effect of picking up a generic block instead of picking up 10 different specific blocks. We present theoretical results on crucial subproblems in actionschema learning using the KWIK framework, which allows us to characterize the sample efficiency of an agent learning these models in a reinforcementlearning setting. These results are extended in an apprenticeship learning paradigm where and agent has access not only to its environment, but also to a teacher that can demonstrate traces of state/action/state sequences. We show that the class of action schemas that are efficiently learnable in this paradigm is strictly larger than those learnable in the online setting. We link
Model transfer for markov decision tasks via parameter matching
 Workshop of the UK Planning and Scheduling Special Interest Group
, 2006
"... Model transfer refers to the process of adjusting a model that was previously identified for one task (source) so that it can be used for a new, related task (target). For decision tasks in unknown Markov environments, we profit through model transfer by using information from related tasks, e.g. tr ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Model transfer refers to the process of adjusting a model that was previously identified for one task (source) so that it can be used for a new, related task (target). For decision tasks in unknown Markov environments, we profit through model transfer by using information from related tasks, e.g. transition knowledge and solution (policy) knowledge, to quickly determine an appropriate model of the new task environment. A difficulty with such transfer is typically the nonlinear and indirect relationship between the available source knowledge and the target’s working prior model of the unknown environment, provided through a complex multidimensional transfer function. In this paper, we take a Bayesian view and present a probability perturbation method that conditions the target’s model parameters to a variety of source knowledge types. The method relies on preposterior distributions, which specifies the distribution of the target’s parameter set given each individual knowledge types. The preposteriors are then combined to obtain a posterior distribution for the parameter set that matches all the available knowledge. The method is illustrated with an example.
Transfer of Task Representation in Reinforcement Learning using Policybased Protovalue Functions ∗ ABSTRACT
"... Reinforcement Learning research is traditionally devoted to solve singletask problems. Therefore, anytime a new task is faced, learning must be restarted from scratch. Recently, several studies have addressed the issue of reusing the knowledge acquired in solving previous related tasks by transferr ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Reinforcement Learning research is traditionally devoted to solve singletask problems. Therefore, anytime a new task is faced, learning must be restarted from scratch. Recently, several studies have addressed the issue of reusing the knowledge acquired in solving previous related tasks by transferring information about policies and value functions. In this paper, we analyze the use of protovalue functions under the transfer learning perspective. Protovalue functions are effective basis functions for the approximation of value functions defined over the graph obtained by a random walk on the environment. The definition of this graph is a key aspect in transfer transfer problems in which both the reward function and the dynamics change. Therefore, we introduce policybased protovalue functions, which can be obtained by considering the graph generated by a random walk guided by the optimal policy of one of the tasks at hand. We compare the effectiveness of policybased and standard protovalue functions, on different transfer problems defined on a simple gridworld environment.
Accelerating Search with Transferred Heuristics
"... A common goal for transfer learning research is to show that a learner can solve a source task and then leverage the learned knowledge to solve a target task faster than if it had learned the target task directly. A more difficult goal is to reduce the total training time so that learning the source ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
A common goal for transfer learning research is to show that a learner can solve a source task and then leverage the learned knowledge to solve a target task faster than if it had learned the target task directly. A more difficult goal is to reduce the total training time so that learning the source task and target task is faster than learning only the target task. This paper addresses the second goal by proposing a transfer hierarchy for 2player games. Such a hierarchy orders games in terms of relative solution difficulty and can be used to select source tasks that are faster to learn than a given target task. We empirically test transfer between two types of tasks in the General Game Playing domain, the testbed for an international competition developed at Stanford. Our results show that transferring learned search heuristics from tasks in different parts of the hierarchy can significantly speed up search even when the source and target tasks differ along a number of important dimensions.
Reusing and building a policy library
 In International Conference on Automated Planning and Scheduling
, 2006
"... Policy Reuse is a method to improve reinforcement learning with the ability to solve multiple tasks by building upon past problem solving experience, as accumulated in a Policy Library. Given a new task, a Policy Reuse learner uses the past policies in the library as a probabilistic bias in its ne ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Policy Reuse is a method to improve reinforcement learning with the ability to solve multiple tasks by building upon past problem solving experience, as accumulated in a Policy Library. Given a new task, a Policy Reuse learner uses the past policies in the library as a probabilistic bias in its new learning process. We present how the effectiveness of each reuse episode is indicative of the novelty of the new task with respect to the previously solved ones in the policy library. In the paper we review Policy Reuse, and we introduce theoretical results that demonstrate that: (i) a Policy Library can be selectively and incrementally built while learning different problems; (ii) the Policy Library can be understood as a basis of the domain that represents its structure through a set of core policies; and (iii) given the basis of a domain, we can define a lower bound for its reuse gain.