Results 1  10
of
10
NearOptimal BRL using Optimistic Local Transitions (Extended Version)
"... Modelbased Bayesian Reinforcement Learning (BRL) allows a sound formalization of the problem of acting optimally while facing an unknown environment, i.e., avoiding the explorationexploitation dilemma. However, algorithms explicitly addressing BRL su er from such a combinatorial explosion that a l ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Modelbased Bayesian Reinforcement Learning (BRL) allows a sound formalization of the problem of acting optimally while facing an unknown environment, i.e., avoiding the explorationexploitation dilemma. However, algorithms explicitly addressing BRL su er from such a combinatorial explosion that a large body of work relies on heuristic algorithms. This paper introduces bolt, a simple and (almost) deterministic heuristic algorithm for BRL which is optimistic about the transition function. We analyze bolt's sample complexity, and show that under certain parameters, the algorithm is nearoptimal in the Bayesian sense with high probability. Then, experimental results highlight the key di erences of this method compared to previous work. Keywords:
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
 In 2nd international conference on agents and artificial intelligence (ICAART 2010
, 2009
"... Abstract: There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting nearoptimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is poss ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
(Show Context)
Abstract: There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting nearoptimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is possible to obtain stochastic lower and upper bounds on the value of each tree node. This enables us to use stochastic branch and bound algorithms to search the tree efficiently. This paper proposes some algorithms and examines their complexity in this setting. 1
Optimistic planning for beliefaugmented Markov decision processes
 In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL
, 2013
"... Abstract—This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel modelbased Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OPMDP) algorithm [10], [9] to contexts where the transition mode ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel modelbased Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OPMDP) algorithm [10], [9] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the beliefaugmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance. I.
van
"... Complexity of stochastic branch and bound for belief tree search in Bayesian reinforcement learning ..."
Abstract
 Add to MetaCart
(Show Context)
Complexity of stochastic branch and bound for belief tree search in Bayesian reinforcement learning
Active Learning of MDP models
"... Abstract. We consider the active learning problem of inferring the transition model of a Markov Decision Process by acting and observing transitions. This is particularly useful when no reward function is a priori defined. Our proposal is to cast the active learning task as a utility maximization pr ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We consider the active learning problem of inferring the transition model of a Markov Decision Process by acting and observing transitions. This is particularly useful when no reward function is a priori defined. Our proposal is to cast the active learning task as a utility maximization problem using Bayesian reinforcement learning with beliefdependent rewards. After presenting three possible performance criteria, we derive from them the beliefdependent rewards to be used in the decisionmaking process. As computing the optimal Bayesian value function is intractable for large horizons, we use a simple algorithm to approximately solve this optimization problem. Despite the suboptimality of this technique, we show experimentally that our proposal is efficient in a number of domains. 1
Proceedings of the TwentyThird International Joint Conference on Artificial Intelligence Linear Bayesian Reinforcement Learning
"... This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first ..."
Abstract
 Add to MetaCart
This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first sampling a transition model from the current posterior, and then performing approximate dynamic programming on the sampled model. This form of approximate Thompson sampling results in good exploration in unknown environments. The approach can also be seen as a Bayesian generalisation of leastsquares policy iteration, where the empirical transition matrix is replaced with a sample from the posterior. 1
To cite this version:
, 2013
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Optimistic Planning for BeliefAugmented
Linear Bayesian Reinforcement Learning
"... This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by fir ..."
Abstract
 Add to MetaCart
This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first sampling a transition model from the current posterior, and then performing approximate dynamic programming on the sampled model. This form of approximate Thompson sampling results in good exploration in unknown environments. The approach can also be seen as a Bayesian generalisation of leastsquares policy iteration, where the empirical transition matrix is replaced with a sample from the posterior. 1
unknown title
, 2009
"... Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning ..."
Abstract
 Add to MetaCart
(Show Context)
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning