Results 1  10
of
52
Policy search for motor primitives in robotics
 Advances in Neural Information Processing Systems 22 (NIPS 2008
, 2009
"... Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are highdimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previou ..."
Abstract

Cited by 117 (24 self)
 Add to MetaCart
(Show Context)
Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are highdimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly wellsuited for dynamic motor primitives. The resulting algorithm is an EMinspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several wellknown parametrized policy search methods and show that it outperforms them. We apply it in the context of motor learning and show that it can learn a complex BallinaCup task using a real Barrett WAMTM robot arm. 1
Gaussian Processes in Reinforcement Learning
 Advances in Neural Information Processing Systems 16
, 2004
"... We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning. 1
A survey of probabilistic models, using the bayesian programming methodology as a unifying framework
 In The Second International Conference on Computational Intelligence, Robotics and Autonomous Systems (CIRAS 2003
, 2003
"... This paper presents a survey of the most common probabilistic models for artefact conception. We use a generic formalism called Bayesian Programming, which we introduce briefly, for reviewing the main probabilistic models found in the literature. Indeed, we show that Bayesian Networks, Markov Locali ..."
Abstract

Cited by 28 (14 self)
 Add to MetaCart
This paper presents a survey of the most common probabilistic models for artefact conception. We use a generic formalism called Bayesian Programming, which we introduce briefly, for reviewing the main probabilistic models found in the literature. Indeed, we show that Bayesian Networks, Markov Localization, Kalman filters, etc., can all be captured under this single formalism. We believe it offers the novice reader a good introduction to these models, while still providing the experienced reader an enriching global view of the field. 1
Goaldirected decision making in prefrontal cortex: A computational framework
"... Research in animal learning and behavioral neuroscience has distinguished between two forms of action control: a habitbased form, which relies on stored action values, and a goaldirected form, which forecasts and compares action outcomes based on a model of the environment. While habitbased contr ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Research in animal learning and behavioral neuroscience has distinguished between two forms of action control: a habitbased form, which relies on stored action values, and a goaldirected form, which forecasts and compares action outcomes based on a model of the environment. While habitbased control has been the subject of extensive computational research, the computational principles underlying goaldirected control in animals have so far received less attention. In the present paper, we advance a computational framework for goaldirected control in animals and humans. We take three empirically motivated points as founding premises: (1) Neurons in dorsolateral prefrontal cortex represent action policies, (2) Neurons in orbitofrontal cortex represent rewards, and (3) Neural computation, across domains, can be appropriately understood as performing structured probabilistic inference. On a purely computational level, the resulting account relates closely to previous work using Bayesian
Anytime planning for decentralized POMDPs using expectation maximization
 IN UAI
, 2010
"... Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. While finitehorizon DECPOMDPs have enjoyed significant success, progress remains slow for the infinitehorizon case mainly due to the inherent complexity of optimizing stochastic controllers representi ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. While finitehorizon DECPOMDPs have enjoyed significant success, progress remains slow for the infinitehorizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinitehorizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DECPOMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the stateoftheart solvers.
An expectation maximization algorithm for continuous markov decision processes with arbitrary rewards
 IN TWELFTH INT. CONF. ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS
, 2009
"... We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequentl ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequently, on the one hand, it is more flexible and general than closedform solutions, such as the widely used linear quadratic Gaussian (LQG) controllers. On the other hand, it is more accurate and faster than optimization methods that rely on approximation and simulation. Partial analytical solutions (though costly) eliminate the need for simulation and, hence, avoid approximation error. The experiments will show that for the same cost of computation, policy optimization methods that rely on analytical tractability have higher value than the ones that rely on simulation.
New inference strategies for solving Markov decision processes using reversible jump MCMC
 IN UAI
, 2009
"... In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higherdimensional spaces. We first ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higherdimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.
Planning and acting in uncertain environments using probabilistic inference
 In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems
, 2006
"... Abstract — An important problem in robotics is planning and selecting actions for goaldirected behavior in noisy uncertain environments. The problem is typically addressed within the framework of partially observable Markov decision processes (POMDPs). Although efficient algorithms exist for learni ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract — An important problem in robotics is planning and selecting actions for goaldirected behavior in noisy uncertain environments. The problem is typically addressed within the framework of partially observable Markov decision processes (POMDPs). Although efficient algorithms exist for learning policies for MDPs, these algorithms do not generalize easily to POMDPs. In this paper, we propose a framework for planning and action selection based on probabilistic inference in graphical models. Unlike previous approaches based on MAP inference, our approach utilizes the most probable explanation (MPE) of variables in a graphical model, allowing tractable and efficient inference of actions. It generalizes easily to complex partially observable environments. Furthermore, it allows rewards and costs to be incorporated in a straightforward manner as part of the inference process. We investigate the application of our approach to the problem of robot navigation by testing it on a suite of wellknown POMDP benchmarks. Our results demonstrate that the proposed method can beat or match the performance of recently proposed specialized POMDP solvers. I.
Grounding Verbs of Motion in Natural Language Commands to Robots
"... Abstract To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. An important class of instructions involve interacting with people, such as “Follow the person to the kitchen ” or “Meet the person at the elevators. ” These instructions r ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Abstract To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. An important class of instructions involve interacting with people, such as “Follow the person to the kitchen ” or “Meet the person at the elevators. ” These instructions require that the robot fluidly react to changes in the environment, not simply follow a precomputed plan. We present an algorithm for understanding natural language commands with three components. First, we create a cost function that scores the language according to how well it matches a candidate plan in the environment, defined as the loglikelihood of the plan given the command. Components of the cost function include novel models for the meanings of motion verbs such as “follow, ” “meet, ” and “avoid, ” as well as spatial relations such as “to ” and landmark phrases such as “the kitchen. ” Second, an inference method uses this cost function to perform forward search, finding a plan that matches the natural language command. Third, a highlevel controller repeatedly calls the inference method at each timestep to compute a new plan in response to changes in the environment such as the movement of the human partner or other people in the scene. When a command consists of more than a single task, the controller switches to the next task when an earlier one is satisfied. We evaluate our approach on a set of example tasks that require the ability to follow both simple and complex natural language commands. 1