Results 1  10
of
17
Probabilistic Movement Primitives
"... Abstract—Movement primitives are a promising approach for modular and reusable movement generation, and suitable for datadriven movement acquisition. Beneficial properties such as simultaneous activation of multiple primitives, optimal movement encoding for stochastic systems, and generalization t ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
Abstract—Movement primitives are a promising approach for modular and reusable movement generation, and suitable for datadriven movement acquisition. Beneficial properties such as simultaneous activation of multiple primitives, optimal movement encoding for stochastic systems, and generalization to new targets, are absent in most common approaches. We propose a probabilistic approach for generating, learning, and reusing movement primitives that overcomes these limitations. We represent a movement primitive as a probability distribution over trajectories. As a consequence, we can activate primitives simultaneously, smoothly blend together, generalize to new target states and encode optimal trajectories in stochastic systems. We compare our approach to the existing stateofthe art and present real robot results for learning from demonstration. Movement primitives (MP) are considered to be a state
Bayesian multitask inverse reinforcement learning
"... Abstract. We generalise the problem of inverse reinforcement learning to multiple tasks, from a set of demonstrations. Each demonstration may represent one expert trying to solve a different task. Alternatively, one may see each demonstration as given by a different expert trying to solve the same t ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We generalise the problem of inverse reinforcement learning to multiple tasks, from a set of demonstrations. Each demonstration may represent one expert trying to solve a different task. Alternatively, one may see each demonstration as given by a different expert trying to solve the same task. Our main technical contribution is to solve the problem by formalising it as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. We show that our methodology allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and imitation learning from multiple teachers.
MultiTask Reinforcement Learning: Shaping and Feature Selection
"... Abstract. Shaping functions can be used in multitask reinforcement learning (RL) to incorporate knowledge from previously experienced source tasks to speed up learning on a new target task. Earlier work has not clearly motivated choices for the shaping function. This paper discusses and empirically ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Shaping functions can be used in multitask reinforcement learning (RL) to incorporate knowledge from previously experienced source tasks to speed up learning on a new target task. Earlier work has not clearly motivated choices for the shaping function. This paper discusses and empirically compares several alternatives, and demonstrates that the most intuive one may not always be the best option. In addition, we extend previous work on identifying good representations for the value and shaping functions, and show that selecting the right representation results in improved generalization over tasks. 1
A Probabilistic Approach to Robot Trajectory Generation
"... Abstract — Motor Primitives (MPs) are a promising approach for the datadriven acquisition as well as for the modular and reusable generation of movements. However, a modular control architecture with MPs is only effective if the MPs support coactivation as well as continuously blending the activa ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract — Motor Primitives (MPs) are a promising approach for the datadriven acquisition as well as for the modular and reusable generation of movements. However, a modular control architecture with MPs is only effective if the MPs support coactivation as well as continuously blending the activation from one MP to the next. In addition, we need efficient mechanisms to adapt a MP to the current situation. Common approaches to movement primitives lack such capabilities or their implementation is based on heuristics. We present a probabilistic movement primitive approach that overcomes the limitations of existing approaches. We encode a primitive as a probability distribution over trajectories. The representation as distribution has several beneficial properties. It allows encoding a timevarying variance profile. Most importantly, it allows performing new operations — a product of distributions for the coactivation of MPs conditioning for generalizing the MP to different desired targets. We derive a feedback controller that reproduces a given trajectory distribution in closed form. We compare our approach to the existing stateofthe art and present real robot results for learning from demonstration. I.
Extracting LowDimensional Control Variables for Movement Primitives
"... Abstract — Movement primitives (MPs) provide a powerful framework for data driven movement generation that has been successfully applied for learning from demonstrations and robot reinforcement learning. In robotics we often want to solve a multitude of different, but related tasks. As the parameter ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Movement primitives (MPs) provide a powerful framework for data driven movement generation that has been successfully applied for learning from demonstrations and robot reinforcement learning. In robotics we often want to solve a multitude of different, but related tasks. As the parameters of the primitives are typically high dimensional, a common practice for the generalization of movement primitives to new tasks is to adapt only a small set of control variables, also called meta parameters, of the primitive. Yet, for most MP representations, the encoding of these control variables is precoded in the representation and can not be adapted to the considered tasks. In this paper, we want to learn the encoding of taskspecific control variables also from data instead of relying on fixed metaparameter representations. We use hierarchical Bayesian models (HBMs) to estimate a low dimensional latent variable model for probabilistic movement primitives (ProMPs), which is a recent movement primitive representation. We show on two real robot datasets that ProMPs based on HBMs outperform standard ProMPs in terms of generalization and learning from a small amount of data and also allows for an intuitive analysis of the movement. We also extend our HBM by a mixture model, such that we can model different movement types in the same dataset. I.
Online MultiTask Learning for Policy Gradient Methods
"... Policy gradient algorithms have shown considerable recent success in solving highdimensional sequential decision making tasks, particularly in robotics. However, these methods often require extensive experience in a domain to achieve high performance. To make agents more sampleefficient, we devel ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Policy gradient algorithms have shown considerable recent success in solving highdimensional sequential decision making tasks, particularly in robotics. However, these methods often require extensive experience in a domain to achieve high performance. To make agents more sampleefficient, we developed a multitask policy gradient method to learn decision making tasks consecutively, transferring knowledge between tasks to accelerate learning. Our approach provides robust theoretical guarantees, and we show empirically that it dramatically accelerates learning on a variety of dynamical systems, including an application to quadrotor control. 1.
Bayesian Hierarchical Reinforcement Learning
"... We describe an approach to incorporating Bayesian priors in the MAXQ framework for hierarchical reinforcement learning (HRL). We define priors on the primitive environment model and on task pseudorewards. Since models for composite tasks can be complex, we use a mixed modelbased/modelfree learnin ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We describe an approach to incorporating Bayesian priors in the MAXQ framework for hierarchical reinforcement learning (HRL). We define priors on the primitive environment model and on task pseudorewards. Since models for composite tasks can be complex, we use a mixed modelbased/modelfree learning approach to find an optimal hierarchical policy. We show empirically that (i) our approach results in improved convergence over nonBayesian baselines, (ii) using both task hierarchies and Bayesian priors is better than either alone, (iii) taking advantage of the task hierarchy reduces the computational cost of Bayesian reinforcement learning and (iv) in this framework, task pseudorewards can be learned instead of being manually specified, leading to hierarchically optimal rather than recursively optimal policies. 1
Better Optimism By Bayes: Adaptive Planning with Rich Models
, 1402
"... The computational costs of inference and planning have confined Bayesian modelbased reinforcement learning to one of two dismal fates: powerful Bayesadaptive planning but only for simplistic models, or powerful, Bayesian nonparametric models but using simple, myopic planning strategies such as Th ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The computational costs of inference and planning have confined Bayesian modelbased reinforcement learning to one of two dismal fates: powerful Bayesadaptive planning but only for simplistic models, or powerful, Bayesian nonparametric models but using simple, myopic planning strategies such as Thompson sampling. We ask whether it is feasible and truly beneficial to combine rich probabilistic models with a closer approximation to fully Bayesian planning. First, we use a collection of counterexamples to show formal problems with the overoptimism inherent in Thompson sampling. Then we leverage stateoftheart techniques in efficient Bayesadaptive planning and nonparametric Bayesian methods to perform qualitatively
Reviewed by:
, 2010
"... Aggression and anxiety: social context and neurobiological ..."
(Show Context)
Bayesian Policy Gradient and ActorCritic Algorithms
, 2016
"... Abstract Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Many conventional policy gradient methods use MonteCarlo techniques to estimate this gradient. The policy is improved by adjusting the parameters i ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Many conventional policy gradient methods use MonteCarlo techniques to estimate this gradient. The policy is improved by adjusting the parameters in the direction of the gradient estimate. Since MonteCarlo methods tend to have high variance, a large number of samples is required to attain accurate estimates, resulting in slow convergence. In this paper, we first propose a Bayesian framework for policy gradient, based on modeling the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates, namely, the gradient covariance, are provided at little extra cost. Since the proposed Bayesian framework considers system trajectories as its basic observable unit, it does not require the dynamics within trajectories to be of any particular form, and thus, can be easily extended to partially observable problems. On the downside, it cannot take advantage of the Markov property when the system is Markovian. To address this issue, we proceed to supplement our Bayesian policy gradient framework with a new actorcritic learning model in which a Bayesian class of nonparametric critics, based on Gaussian process temporal difference learning, is used. Such critics model the actionvalue function as a Gaussian process, allowing Bayes' rule to be used in computing the posterior distribution over actionvalue functions, conditioned on the observed data. Appropriate choices of the policy parameterization and of the prior covariance (kernel) between actionvalues allow us to obtain closedform expressions for the posterior distribution of the gradient of the expected return with respect to the policy parameters. We perform detailed experimental comparisons of the proposed Bayesian policy gradient and actorcritic algorithms with classic MonteCarlo based policy gradient methods, as well as with each other, on a number of reinforcement learning problems.