Results 1 - 10
of
134
Transfer learning via inter-task mappings for temporal difference learning
- Journal of Machine Learning Research
"... Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirab ..."
Abstract
-
Cited by 66 (20 self)
- Add to MetaCart
Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but the most basic algorithms have often been found slow in practice. This empirical result has motivated the development of many methods that speed up reinforcement learning by modifying a task for the learner or helping the learner better generalize to novel situations. This article focuses on generalizing across tasks, thereby speeding up learning, via a novel form of transfer using handcoded task relationships. We compare learning on a complex task with three function approximators, a cerebellar model arithmetic computer (CMAC), an artificial neural network (ANN), and a radial basis function (RBF), and empirically demonstrate that directly transferring the action-value function can lead to a dramatic speedup in learning with all three. Using transfer via inter-task mapping (TVITM), agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup soccer Keepaway domain. This article contains and extends material published in two conference papers (Taylor and Stone, 2005; Taylor et al., 2005).
Keepaway soccer: From machine learning testbed to benchmark
- RoboCup-2005: Robot Soccer World Cup IX
, 2006
"... Abstract. Keepaway soccer has been previously put forth as a testbed for machine learning. Although multiple researchers have used it successfully for machine learning experiments, doing so has required a good deal of domain expertise. This paper introduces a set of programs, tools, and resources de ..."
Abstract
-
Cited by 62 (24 self)
- Add to MetaCart
(Show Context)
Abstract. Keepaway soccer has been previously put forth as a testbed for machine learning. Although multiple researchers have used it successfully for machine learning experiments, doing so has required a good deal of domain expertise. This paper introduces a set of programs, tools, and resources designed to make the domain easily usable for experimentation without any prior knowledge of RoboCup or the Soccer Server. In addition, we report on new experiments in the Keepaway domain, along with performance results designed to be directly comparable with future experimental results. Combined, the new infrastructure and our concrete demonstration of its use in comparative experiments elevate the domain to a machine learning benchmark, suitable for use by researchers across the field. 1 Introduction Keepaway soccer in the Soccer Server used at RoboCup has been previouslyput forth as a testbed for machine learning [15]. Since then it has been used for
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
Using homomorphisms to transfer options across continuous reinforcement learning domains
- In AAAI Conference on Artificial Intelligence
, 2006
"... We examine the problem of Transfer in Reinforcement Learning and present a method to utilize knowledge ac-quired in one Markov Decision Process (MDP) to boot-strap learning in a more complex but related MDP. We build on work in model minimization in Reinforcement Learning to define relationships bet ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
We examine the problem of Transfer in Reinforcement Learning and present a method to utilize knowledge ac-quired in one Markov Decision Process (MDP) to boot-strap learning in a more complex but related MDP. We build on work in model minimization in Reinforcement Learning to define relationships between state-action pairs of the two MDPs. Our main contribution in this work is to provide a way to compactly represent such mappings using relationships between state variables in the two domains. We use these functions to transfer a learned policy in the first domain into an option in the new domain, and apply intra-option learning methods to bootstrap learning in the new domain. We first evaluate our approach in the well known Blocksworld domain. We then demonstrate that our approach to transfer is vi-able in a complex domain with a continuous state space by evaluating it in the Robosoccer Keepaway domain.
Transfer via inter-task mappings in policy search reinforcement learning.
- In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems,
, 2007
"... ABSTRACT The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks wit ..."
Abstract
-
Cited by 42 (19 self)
- Add to MetaCart
(Show Context)
ABSTRACT The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks with different state and action spaces. In particular, this paper utilizes transfer via inter-task mappings for policy search methods (TVITM-PS) to construct a transfer functional that translates a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that TVITM-PS can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that TVITM-PS still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.
Transferring instances for model-based reinforcement learning
- AAMAS 2008 Workshop on Adaptive Learning Agents and Multi-Agent Systems
, 2008
"... Reinforcement learning agents typically require a significant amount of data before performing well on complex tasks. Transfer learning methods have made progress reducing sample complexity, but they have only been applied to model-free learning methods, not more data-efficient model-based learning ..."
Abstract
-
Cited by 42 (10 self)
- Add to MetaCart
Reinforcement learning agents typically require a significant amount of data before performing well on complex tasks. Transfer learning methods have made progress reducing sample complexity, but they have only been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample complexity and asymptotic performance of a model-based algorithm when learning in a continuous state space.
Comparing evolutionary and temporal difference methods in a reinforcement learning domain
- In GECCO 2006: Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1321–1328). 123 Agent Multi-Agent Syst
, 2006
"... Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods ’ relative strengths and weaknesses. ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
(Show Context)
Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods ’ relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT [19], a GA that evolves neural networks, with Sarsa [16, 17], a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.
Value Functions for RL-Based Behavior Transfer: A Comparative Study
- In Proceedings of the 20th National Conference on Artificial Intelligence
, 2005
"... Temporal difference (TD) learning methods (Sutton & Barto 1998) have become popular reinforcement learning techniques in recent years. TD methods, relying on function approxima-tors to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit so ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
(Show Context)
Temporal difference (TD) learning methods (Sutton & Barto 1998) have become popular reinforcement learning techniques in recent years. TD methods, relying on function approxima-tors to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found slow in practice. This paper presents methods for further generaliz-ing across tasks, thereby speeding up learning, via a novel form of behavior transfer. We compare learning on a complex task with three function approximators, a CMAC, a neural network, and an RBF, and demonstrate that behavior transfer works well with all three. Using behavior transfer, agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup-soccer keepaway domain.
Reinforcement learning for robot soccer
, 2009
"... Batch reinforcement learning methods provide a powerful framework for learning efficiently and effectively in autonomous robots. The paper reviews some recent work of the authors aiming at the successful application of reinforcement learning in a challenging and complex domain. It discusses several ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
Batch reinforcement learning methods provide a powerful framework for learning efficiently and effectively in autonomous robots. The paper reviews some recent work of the authors aiming at the successful application of reinforcement learning in a challenging and complex domain. It discusses several variants of the general batch learning framework, particularly tailored to the use of multilayer perceptrons to approximate value functions over continuous state spaces. The batch learning framework is successfully used to learn crucial skills in our soccer-playing robots participating in the RoboCup competitions. This is demonstrated on three different case studies.
Adaptive Tile Coding for Value Function Approximation
"... Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many real-world problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depe ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many real-world problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. This paper presents adaptive tile coding, a novel method that automates this design process for tile coding, a popular function approximator, by beginning with a simple representation with few tiles and refining it during learning by splitting existing tiles into smaller ones. In addition to automatically discovering effective representations, this approach provides a natural way to reduce the function approximator’s level of generalization over time. Empirical results in multiple domains compare two different criteria for deciding which tiles to split and verify that adaptive tile coding can automatically discover effective representations and that its speed of learning is competitive with the best fixed representations.