Results 1  10
of
19
Learning Modelfree Robot Control by a Monte Carlo EM
"... We address the problem of learning robot control by modelfree reinforcement learning (RL). We adopt the probabilistic model of Vlassis and Toussaint (2009) for modelfree RL, and we propose a Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller para ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We address the problem of learning robot control by modelfree reinforcement learning (RL). We adopt the probabilistic model of Vlassis and Toussaint (2009) for modelfree RL, and we propose a Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller parameters using information obtained from randomly generated robot trajectories. MCEM is related to, and generalizes, the PoWER algorithm of Kober and Peters (2009). In the finitehorizon case MCEM reduces precisely to PoWER, but MCEM can also handle the discounted infinitehorizon case. An interesting result is that the infinitehorizon case can be viewed as a ‘randomized’ version of the finitehorizon case, in the sense that the length of each sampled trajectory is a random draw from an appropriately constructed geometric distribution. We provide some preliminary experiments demonstrating the effects of fixed (PoWER) vs randomized (MCEM) horizon length in two simulated and one real robot control tasks. 1 1
Dynamic Policy Programming
 Journal of Machine Learning Research
"... The following full text is a preprint version which may differ from the publisher's version. ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
The following full text is a preprint version which may differ from the publisher's version.
Perception, Action and Utility: The Tangled Skein
, 2011
"... Normative theories of learning and decisionmaking are motivated by a computationallevel analysis of the task facing an animal: what should the animal do to maximize future reward? However, much of the recent excitement in this field originates in how the animal arrives at its decisions and reward ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Normative theories of learning and decisionmaking are motivated by a computationallevel analysis of the task facing an animal: what should the animal do to maximize future reward? However, much of the recent excitement in this field originates in how the animal arrives at its decisions and reward predictions—algorithmic questions about which the computationallevel analysis is silent.
MonteCarlo Expectation Maximization for Decentralized POMDPs
"... We address two significant drawbacks of stateoftheart solvers of decentralized POMDPs (DECPOMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DECPOMDPs via a reduction to the ma ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We address two significant drawbacks of stateoftheart solvers of decentralized POMDPs (DECPOMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DECPOMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a modelfree version of this approach that employs MonteCarlo EM (MCEM). While a naïve implementation of MCEM is inadequate in multiagent settings, we introduce several improvements in sampling that produce highquality results on a variety of DECPOMDP benchmarks, including large problems with thousands of agents. 1
Inference strategies for solving semiMarkov decision processes
"... SemiMarkov decision processes (SMDPs) generalize standard MDPs to domains where time is not discretized equally between every set of states and actions [3]. Instead we can define a jumpMarkov process where the amount of time spent in each state is a stochastic random variable. This formulation giv ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
SemiMarkov decision processes (SMDPs) generalize standard MDPs to domains where time is not discretized equally between every set of states and actions [3]. Instead we can define a jumpMarkov process where the amount of time spent in each state is a stochastic random variable. This formulation gives us an intuitive way to reason about actions where it is also necessary to take into account how long these actions will take to perform. Formally we can define an SMDP as a continuoustime controlled stochastic process (x(t), u(t)) consisting, respectively, of states and actions at every point in time t where state transitions occur at random arrival times Tn. In particular, the process is stationary in between jumps, i.e. x(t) = xn and u(t) = un
Scalable Planning and Learning for Multiagent POMDPs
"... Bayesian methods for reinforcement learning (BRL) allow model uncertainty to be considered explicitly and offer a principled way of dealing with the exploration/exploitation tradeoff. However, for multiagent systems there have been few such approaches, and none of them apply to problems with state ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Bayesian methods for reinforcement learning (BRL) allow model uncertainty to be considered explicitly and offer a principled way of dealing with the exploration/exploitation tradeoff. However, for multiagent systems there have been few such approaches, and none of them apply to problems with state uncertainty. In this paper, we fill this gap by proposing a BRL framework for multiagent partially observable Markov decision processes. It considers a team of agents that operates in a centralized fashion, but has uncertainty about both the state and the model of the environment, essentially transforming the learning problem to a planning problem. To deal with the complexity of this planning problem as well as other planning problems with a large number of actions and observations, we propose a novel scalable approach based on samplebased planning and factored value functions that exploits structure present in many multiagent settings. Experimental results show that we are able to provide high quality solutions to large problems even with a large amount of initial model uncertainty. We also show that our approach applies in the (traditional) planning setting, demonstrating significantly more efficient planning in factored settings.
Compositional Policy Priors
, 2013
"... This paper describes a probabilistic framework for incorporating structured inductive biases into reinforcement learning. These inductive biases arise from policy priors, probability distributions over optimal policies. Borrowing recent ideas from computational linguistics and Bayesian nonparametr ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This paper describes a probabilistic framework for incorporating structured inductive biases into reinforcement learning. These inductive biases arise from policy priors, probability distributions over optimal policies. Borrowing recent ideas from computational linguistics and Bayesian nonparametrics, we define several families of policy priors that express compositional, abstract structure in a domain. Compositionality is expressed using probabilistic contextfree grammars, enabling a compact representation of hierarchically organized subtasks. Useful sequences of subtasks can be cached and reused by extending the grammars nonparametrically using Fragment Grammars. We present Monte Carlo methods for performing inference, and show how structured policy priors lead to substantially faster learning in complex domains compared to methods without inductive biases. 1
Proceedings of the TwentyThird International Joint Conference on Artificial Intelligence MonteCarlo Expectation Maximization for Decentralized POMDPs
"... We address two significant drawbacks of stateoftheart solvers of decentralized POMDPs (DECPOMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DECPOMDPs via a reduction to the ma ..."
Abstract
 Add to MetaCart
We address two significant drawbacks of stateoftheart solvers of decentralized POMDPs (DECPOMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DECPOMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a modelfree version of this approach that employs MonteCarlo EM (MCEM). While a naïve implementation of MCEM is inadequate in multiagent settings, we introduce several improvements in sampling that produce highquality results on a variety of DECPOMDP benchmarks, including large problems with thousands of agents. 1