Results 1  10
of
141
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract

Cited by 91 (6 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finitehorizon discrete POMDP is PSPACEcomplete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
Reinforcement Learning as Classification: Leveraging Modern Classifiers
 in Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... The basic tools of machine learning appear in the inner loop of most reinforcement learning algorithms, typically in the form of Monte Carlo methods or function approximation techniques. ..."
Abstract

Cited by 84 (5 self)
 Add to MetaCart
The basic tools of machine learning appear in the inner loop of most reinforcement learning algorithms, typically in the form of Monte Carlo methods or function approximation techniques.
Transfer learning via intertask mappings for temporal difference learning
 Journal of Machine Learning Research
"... Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirab ..."
Abstract

Cited by 64 (20 self)
 Add to MetaCart
(Show Context)
Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but the most basic algorithms have often been found slow in practice. This empirical result has motivated the development of many methods that speed up reinforcement learning by modifying a task for the learner or helping the learner better generalize to novel situations. This article focuses on generalizing across tasks, thereby speeding up learning, via a novel form of transfer using handcoded task relationships. We compare learning on a complex task with three function approximators, a cerebellar model arithmetic computer (CMAC), an artificial neural network (ANN), and a radial basis function (RBF), and empirically demonstrate that directly transferring the actionvalue function can lead to a dramatic speedup in learning with all three. Using transfer via intertask mapping (TVITM), agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup soccer Keepaway domain. This article contains and extends material published in two conference papers (Taylor and Stone, 2005; Taylor et al., 2005).
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
(Show Context)
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
Exploiting FirstOrder Regression in Inductive Policy Selection
 Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI’04
, 2004
"... We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function usi ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function using firstorder decisiontheoretic regression and formula rewriting, while the former, when provided with a suitable hypotheses language, are capable of generalising value functions or policies for small instances. Our idea is to use reasoning and in particular classical firstorder regression to automatically generate a hypotheses language dedicated to the domain at hand, which is then used as input by an inductive solver. This approach avoids the more complex reasoning of symbolic dynamic programming while focusing the inductive solver’s attention on concepts that are specifically relevant to the optimal value function for the domain considered. 1
Bellman goes Relational
 In ICML
, 2004
"... Motivated by the interest in relational reinforcement learning, we introduce a novel relational Bellman update operator called ReBel. It employs a constraint logic programming language to compactly represent Markov decision processes over relational domains. ..."
Abstract

Cited by 46 (3 self)
 Add to MetaCart
Motivated by the interest in relational reinforcement learning, we introduce a novel relational Bellman update operator called ReBel. It employs a constraint logic programming language to compactly represent Markov decision processes over relational domains.
K.: Relational reinforcement learning: An overview
 In: Proceedings of the ICML’04 Workshop on Relational Reinforcement Learning
, 2004
"... Relational Reinforcement Learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead. 1. ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
(Show Context)
Relational Reinforcement Learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead. 1.
Learning domainspecific control knowledge from random walks
 In Proceedings of the fourteenth international
, 2004
"... We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on “long random walk ” problem distributions. The system is based on viewing planning domains as very large Markov decisi ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on “long random walk ” problem distributions. The system is based on viewing planning domains as very large Markov decision processes and then applying a recent variant of approximate policy iteration that is bootstrapped with a new technique based on random walks. We evaluate the system on the AIPS2000 planning domains (among others) and show that often the learned policies perform well on problems drawn from the long–randomwalk distribution. In addition, we show that these policies often perform well on the original problem distributions from the domains involved. Our evaluation also uncovers limitations of our current system that point to future challenges.
Relating reinforcement learning performance to classification performance
 In 22nd International Conference on Machine Learning (ICML
, 2005
"... We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden state ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden states) and the resulting statement is independent of the number of states or actions. The statement is critically dependent on the size of the rewards and prediction performance of the created classifiers. We also provide some general guidelines for obtaining good classification performance on the created subproblems. In particular, we discuss possible methods for generating training examples for a classifier learning algorithm. 1.