Results 1 
7 of
7
Sample Complexity of Multitask Reinforcement Learning
"... Transferring knowledge across a sequence of reinforcementlearning tasks is challenging, and has a number of important applications. Though there is encouraging empirical evidence that transfer can improve performance in subsequent reinforcementlearning tasks, there has been very little theoretical ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Transferring knowledge across a sequence of reinforcementlearning tasks is challenging, and has a number of important applications. Though there is encouraging empirical evidence that transfer can improve performance in subsequent reinforcementlearning tasks, there has been very little theoretical analysis. In this paper, we introduce a new multitask algorithm for a sequence of reinforcementlearning tasks when each task is sampled independently from (an unknown) distribution over a finite set of Markov decision processes whose parameters are initially unknown. For this setting, we prove under certain assumptions that the pertask sample complexity of exploration is reduced significantly due to transfer compared to standard singletask algorithms. Our multitask algorithm also has the desired characteristic that it is guaranteed not to exhibit negative transfer: in the worst case its pertask sample complexity is comparable to the corresponding singletask algorithm. 1
Generalization and exploration via randomized value functions. arXiv preprint arXiv:1402.0635
 In Proceedings of the TwentyFifth Conference on Uncertainty in Artificial Intelligence
, 2014
"... We consider the problem of reinforcement learning with an orientation toward contexts in which an agent must generalize from past experience and explore to reduce uncertainty. We propose an approach to exploration based on randomized value functions and an algorithm – randomized leastsquares value ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of reinforcement learning with an orientation toward contexts in which an agent must generalize from past experience and explore to reduce uncertainty. We propose an approach to exploration based on randomized value functions and an algorithm – randomized leastsquares value iteration (RLSVI) – that embodies this approach. We explain why versions of leastsquares value iteration that use Boltzmann or greedy exploration can be highly inefficient and present computational results that demonstrate dramatic efficiency gains enjoyed by RLSVI. Our experiments focus on learning over episodes of a finitehorizon Markov decision process and use a version of RLSVI designed for that task, but we also propose a version of RLSVI that addresses continual learning in an infinitehorizon discounted Markov decision process. 1
Efficient exploration and value function generalization in deterministic systems
 In Advances in Neural Information Processing Systems 26
, 2013
"... We consider the problem of reinforcement learning over episodes of a finitehorizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true va ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of reinforcement learning over episodes of a finitehorizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function Q ∗ lies within the hypothesis class Q, OCP selects optimal actions over all but at most dimE[Q] episodes, where dimE denotes the eluder dimension. We establish further efficiency and asymptotic performance guarantees that apply even if Q ∗ does not lie in Q, for the special case where Q is the span of prespecified indicator functions over disjoint sets.
Rationality, Optimism and Guarantees in General Reinforcement Learning
, 2015
"... Abstract In this article, 1 we present a topdown theoretical study of general reinforcement learning agents. We begin with rational agents with unlimited resources and then move to a setting where an agent can only maintain a limited number of hypotheses and optimizes plans over a horizon much sho ..."
Abstract
 Add to MetaCart
Abstract In this article, 1 we present a topdown theoretical study of general reinforcement learning agents. We begin with rational agents with unlimited resources and then move to a setting where an agent can only maintain a limited number of hypotheses and optimizes plans over a horizon much shorter than what the agent designer actually wants. We axiomatize what is rational in such a setting in a manner that enables optimism, which is important to achieve systematic explorative behavior. Then, within the class of agents deemed rational, we achieve convergence and finiteerror bounds. Such results are desirable since they imply that the agent learns well from its experiences, but the bounds do not directly guarantee good performance and can be achieved by agents doing things one should obviously not. Good performance cannot in fact be guaranteed for any agent in fully general settings. Our approach is to design agents that learn well from experience and act rationally. We introduce a framework for general reinforcement learning agents based on rationality axioms for a decision function and an hypothesisgenerating function designed so as to achieve guarantees on the number errors. We will consistently use an optimistic decision function but the hypothesisgenerating function needs to change depending on what is known/assumed. We investigate a number of natural situations having either a frequentist or Bayesian flavor, deterministic or stochastic environments and either finite or countable hypothesis class. Further, to achieve sufficiently good bounds as to hold promise for practical success we introduce a notion of a class of environments being generated by a set of laws. None of the above has previously been done for fully general reinforcement learning environments.
Bayesian Reinforcement Learning with Exploration
"... Abstract. We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax samplecomplexity bounds in a very general class of (historybased) environments. We also prove lower bounds and show that the new al ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax samplecomplexity bounds in a very general class of (historybased) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worstcase.
Learning Agents with Evolving Hypothesis Classes
"... Abstract. It has recently been shown that a Bayesian agent with a universal hypothesis class resolves most induction problems discussed in the philosophy of science. These ideal agents are, however, neither practical nor a good model for how real science works. We here introduce a framework for lear ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. It has recently been shown that a Bayesian agent with a universal hypothesis class resolves most induction problems discussed in the philosophy of science. These ideal agents are, however, neither practical nor a good model for how real science works. We here introduce a framework for learning based on implicit beliefs over all possible hypotheses and limited sets of explicit theories sampled from an implicit distribution represented only by the process by which it generates new hypotheses. We address the questions of how to act based on a limited set of theories as well as what an ideal sampling process should be like. Finally, we discuss topics in philosophy of science and cognitive science from the perspective of this framework. 1