Results 1  10
of
37
Reinforcement learning with limited reinforcement: Using bayes risk for active learning in pomdps. ISAIM (online proceedings
, 2008
"... Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to sp ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
(Show Context)
Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to specify only from domain knowledge. In this paper, we present an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a “modeluncertainty ” POMDP. Coupled with modeldirected queries, our planner actively learns good policies. We demonstrate our approach on several POMDP problems. 1.
A MonteCarlo AIXI Approximation
, 2009
"... This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. Thi ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finitehorizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems.
The Infinite Partially Observable Markov Decision Process
"... The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many realworld p ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many realworld problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and only models visited states explicitly. We demonstrate the iPOMDP on several standard problems. 1
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
"... Bayesian learning methods have recently been shown to provide an elegant solution to the explorationexploitation tradeoff in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of th ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Bayesian learning methods have recently been shown to provide an elegant solution to the explorationexploitation tradeoff in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the BayesAdaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent’s return improve as a function of experience. Keywords: processes reinforcement learning, Bayesian inference, partially observable Markov decision 1.
Reinforcement Learning via AIXI Approximation
"... This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
(Show Context)
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agentspecific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains. 1.
Nonparametric Bayesian Policy Priors for Reinforcement Learning
 In Neural Information Processing Systems (NIPS
, 2010
"... We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectori ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning. 1
Sequentially Optimal Repeated Coalition Formation under Uncertainty
 JOURNAL OF AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (JAAMAS)
"... Coalition formation is a central problem in multiagent systems research, but most models assume common knowledge of agent types. In practice, however, agents are often unsure of the types or capabilities of their potential partners, but gain information about these capabilities through repeated inte ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Coalition formation is a central problem in multiagent systems research, but most models assume common knowledge of agent types. In practice, however, agents are often unsure of the types or capabilities of their potential partners, but gain information about these capabilities through repeated interaction. In this paper, we propose a novel Bayesian, modelbased reinforcement learning framework for this problem, assuming that coalitions are formed (and tasks undertaken) repeatedly. Our model allows agents to refine their beliefs about the types of others as they interact within a coalition. The model also allows agents to make explicit tradeoffs between exploration (forming “new ” coalitions to learn more about the types of new potential partners) and exploitation (relying on partners about which more is known), using value of information to define optimal exploration policies. Our framework effectively integrates decision making during repeated coalition formation under type uncertainty with Bayesian reinforcement learning techniques. Specifically, we present several learning algorithms to approximate the optimal Bayesian solution to the repeated coalition formation and typelearning problem, providing tractable means to ensure good sequential performance. We evaluate our algorithms in a variety of settings, showing that one method in particular exhibits consistently good performance in practice. We also demonstrate the ability of our model to facilitate knowledge transfer across different dynamic tasks.
Deisenroth: Probabilistic Inference for Fast Learning in Control
 In Proceedings of the 8 th European Workshop on Reinforcement Learning (EWRL
, 2008
"... Abstract. We provide a novel framework for very fast modelbased reinforcement learning in continuous state and action spaces. The framework requires probabilistic models that explicitly characterize their levels of confidence. Within this framework, we use flexible, nonparametric models to describ ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We provide a novel framework for very fast modelbased reinforcement learning in continuous state and action spaces. The framework requires probabilistic models that explicitly characterize their levels of confidence. Within this framework, we use flexible, nonparametric models to describe the world based on previously collected experience. We demonstrate learning on the cartpole problem in a setting where we provide very limited prior knowledge about the task. Learning progresses rapidly, and a good policy is found after only a handfull of iterations. 1
Monte Carlo Bayesian reinforcement learning
 In Proceedings of the 29th International Conference on Machine Learning
, 2012
"... Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them. This paper presents Monte Carlo BRL (MCBRL), a simple and general approach to BRL. MCBRL samples a priori a fini ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them. This paper presents Monte Carlo BRL (MCBRL), a simple and general approach to BRL. MCBRL samples a priori a finite set of hypotheses for the model parameter values and forms a discrete partially observable Markov decision process (POMDP) whose state space is a cross product of the state space for the reinforcement learning task and the sampled model parameter space. The POMDP does not require conjugate distributions for belief representation, as earlier works do, and can be solved relatively easily with pointbased approximation algorithms. MCBRL naturally handles both fully and partially observable worlds. Theoretical and experimental results show that the discrete POMDP approximates the underlying BRL task well with guaranteed performance. 1.
Modelbased Bayesian reinforcement learning for dialogue management
 In Proceedings of the 14th Annual Conference of the International Speech Communication Association
, 2013
"... Reinforcement learning methods are increasingly used to optimise dialogue policies from experience. Most current techniques are modelfree: they directly estimate the utility of various actions, without explicit model of the interaction dynamics. In this paper, we investigate an alternative strat ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Reinforcement learning methods are increasingly used to optimise dialogue policies from experience. Most current techniques are modelfree: they directly estimate the utility of various actions, without explicit model of the interaction dynamics. In this paper, we investigate an alternative strategy grounded in modelbased Bayesian reinforcement learning. Bayesian inference is used to maintain a posterior distribution over the model parameters, reflecting the model uncertainty. This parameter distribution is gradually refined as more data is collected and simultaneously used to plan the agent’s actions. Within this learning framework, we carried out experiments with two alternative formalisations of the transition model, one encoded with standard multinomial distributions, and one structured with probabilistic rules. We demonstrate the potential of our approach with empirical results on a user simulator constructed from WizardofOz data in a human–robot interaction scenario. The results illustrate in particular the benefits of capturing prior domain knowledge with highlevel rules. Index Terms: dialogue management, reinforcement learning, Bayesian inference, probabilistic models, POMDPs