Results 1  10
of
45
Robot Trajectory Optimization using Approximate Inference
"... The general stochastic optimal control (SOC) problem in robotics scenarios is often too complex to be solved exactly and in near real time. A classical approximate solution is to first compute an optimal (deterministic) trajectory and then solve a local linearquadraticgaussian (LQG) perturbation m ..."
Abstract

Cited by 69 (16 self)
 Add to MetaCart
(Show Context)
The general stochastic optimal control (SOC) problem in robotics scenarios is often too complex to be solved exactly and in near real time. A classical approximate solution is to first compute an optimal (deterministic) trajectory and then solve a local linearquadraticgaussian (LQG) perturbation model to handle the system stochasticity. We present a new algorithm for this approach which improves upon previous algorithms like iLQG. We consider a probabilistic model for which the maximum likelihood (ML) trajectory coincides with the optimal trajectory and which, in the LQG case, reproduces the classical SOC solution. The algorithm then utilizes approximate inference methods (similar to expectation propagation) that efficiently generalize to nonLQG systems. We demonstrate the algorithm on a simulated 39DoF humanoid robot. 1.
Hierarchical POMDP Controller Optimization by Likelihood Maximization
"... Planning can often be simplified by decomposing the task into smaller tasks arranged hierarchically. Charlin et al. (2006) recently showed that the hierarchy discovery problem can be framed as a nonconvex optimization problem. However, the inherent computational difficulty of solving such an optimi ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
(Show Context)
Planning can often be simplified by decomposing the task into smaller tasks arranged hierarchically. Charlin et al. (2006) recently showed that the hierarchy discovery problem can be framed as a nonconvex optimization problem. However, the inherent computational difficulty of solving such an optimization problem makes it hard to scale to realworld problems. In another line of research, Toussaint et al. (2006) developed a method to solve planning problems by maximumlikelihood estimation. In this paper, we show how the hierarchy discovery problem in partially observable domains can be tackled using a similar maximum likelihood approach. Our technique first transforms the problem into a dynamic Bayesian network through which a hierarchical structure can naturally be discovered while optimizing the policy. Experimental results demonstrate that this approach scales better than previous techniques based on nonconvex optimization. 1
Anytime planning for decentralized POMDPs using expectation maximization
 IN UAI
, 2010
"... Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. While finitehorizon DECPOMDPs have enjoyed significant success, progress remains slow for the infinitehorizon case mainly due to the inherent complexity of optimizing stochastic controllers representi ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. While finitehorizon DECPOMDPs have enjoyed significant success, progress remains slow for the infinitehorizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinitehorizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DECPOMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the stateoftheart solvers.
Variational methods for Reinforcement Learning
"... We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transition matrix is obtained from which the optimal decision policy is formed. The classical maximum likelihood point estimate ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transition matrix is obtained from which the optimal decision policy is formed. The classical maximum likelihood point estimate of the transition model does not reflect the uncertainty in the estimate of the transition model and the resulting policies may consequently lack a sufficient degree of exploration. We consider a Bayesian alternative that maintains a distribution over the transition so that the resulting policy takes into account the limited experience of the environment. The resulting algorithm is formally intractable and we discuss two approximate solution methods, Variational Bayes and Expectation Propagation.
An expectation maximization algorithm for continuous markov decision processes with arbitrary rewards
 IN TWELFTH INT. CONF. ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS
, 2009
"... We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequentl ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequently, on the one hand, it is more flexible and general than closedform solutions, such as the widely used linear quadratic Gaussian (LQG) controllers. On the other hand, it is more accurate and faster than optimization methods that rely on approximation and simulation. Partial analytical solutions (though costly) eliminate the need for simulation and, hence, avoid approximation error. The experiments will show that for the same cost of computation, policy optimization methods that rely on analytical tractability have higher value than the ones that rely on simulation.
New inference strategies for solving Markov decision processes using reversible jump MCMC
 IN UAI
, 2009
"... In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higherdimensional spaces. We first ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higherdimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.
Natively probabilistic computing
, 2009
"... I introduce a new set of natively probabilistic computing abstractions, including probabilistic generalizations of Boolean circuits, backtracking search and pure Lisp. I show how these tools let one compactly specify probabilistic generative models, generalize and parallelize widely used sampling al ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
I introduce a new set of natively probabilistic computing abstractions, including probabilistic generalizations of Boolean circuits, backtracking search and pure Lisp. I show how these tools let one compactly specify probabilistic generative models, generalize and parallelize widely used sampling algorithms like rejection sampling and Markov chain Monte Carlo, and solve difficult Bayesian inference problems. I first introduce Church, a probabilistic programming language for describing probabilistic generative processes that induce distributions, which generalizes Lisp, a language for describing deterministic procedures that induce functions. I highlight the ways randomness meshes with the reflectiveness of Lisp to support the representation of structured, uncertain knowledge, including nonparametric Bayesian models from the current literature, programs for decision making under uncertainty, and programs that learn very simple programs from data. I then introduce systematic stochastic search, a recursive algorithm for exact and approximate sampling that generalizes a
A Minimum Relative Entropy Principle for Learning and Acting
 J. Artif. Intell. Res. 2010
"... This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adapti ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
(Show Context)
This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the wellknown Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert. 1.
Tree exploration for Bayesian RL exploration
 In: Proceedings of the international conference on computational intelligence for modelling, control and automation, (CIMCA
, 2008
"... Research in reinforcement learning has produced algorithms for optimal decision making under uncertainty that fall within two main types. The first employs a Bayesian framework, where optimality improves with increased computational time. This is because the resulting planning task takes the form of ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
Research in reinforcement learning has produced algorithms for optimal decision making under uncertainty that fall within two main types. The first employs a Bayesian framework, where optimality improves with increased computational time. This is because the resulting planning task takes the form of a dynamic programming problem on a belief tree with an infinite number of states. The second type employs relatively simple algorithm which are shown to suffer small regret within a distributionfree framework. This paper presents a lower bound and a high probability upper bound on the optimal value function for the nodes in the Bayesian belief tree, which are analogous to similar bounds in POMDPs. The bounds are then used to create more efficient strategies for exploring the tree. The resulting algorithms are compared with the distributionfree algorithm UCB1, as well as a simpler baseline algorithm on multiarmed bandit problems. 1
Periodic finite state controllers for efficient POMDP and DECPOMDP planning
 In Proc. of the 25th Annual Conf. on Neural Information Processing Systems
, 2011
"... Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DECPOMDPs) find a policy for multiple agents. The policy in i ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DECPOMDPs) find a policy for multiple agents. The policy in infinitehorizon POMDP and DECPOMDP problems has been represented as finite state controllers (FSCs). We introduce a novel class of periodic FSCs, composed of layers connected only to the previous and next layer. Our periodic FSC method finds a deterministic finitehorizon policy and converts it to an initial periodic infinitehorizon policy. This policy is optimized by a new infinitehorizon algorithm to yield deterministic periodic policies, and by a new expectation maximization algorithm to yield stochastic periodic policies. Our method yields better results than earlier planning methods and can compute larger solutions than with regular FSCs. 1