Results 1  10
of
115
Policy search for motor primitives in robotics
 Advances in Neural Information Processing Systems 22 (NIPS 2008
, 2009
"... Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are highdimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previou ..."
Abstract

Cited by 117 (24 self)
 Add to MetaCart
(Show Context)
Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are highdimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly wellsuited for dynamic motor primitives. The resulting algorithm is an EMinspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several wellknown parametrized policy search methods and show that it outperforms them. We apply it in the context of motor learning and show that it can learn a complex BallinaCup task using a real Barrett WAMTM robot arm. 1
Natural ActorCritic
, 2007
"... In this paper, we suggest a novel reinforcement learning architecture, the Natural ActorCritic. The actor updates are achieved using stochastic policy gradients employing Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a valu ..."
Abstract

Cited by 90 (10 self)
 Add to MetaCart
In this paper, we suggest a novel reinforcement learning architecture, the Natural ActorCritic. The actor updates are achieved using stochastic policy gradients employing Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policygradient compatible function approximation. We show that several wellknown reinforcement learning methods such as the original ActorCritic and Bradtke’s Linear Quadratic QLearning are in fact Natural ActorCritic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.
PILCO: A ModelBased and DataEfficient Approach to Policy Search
 In Proceedings of the International Conference on Machine Learning
, 2011
"... In this paper, we introduce pilco, a practical, dataefficient modelbased policy search method. Pilco reduces model bias, one of the key problems of modelbased reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty int ..."
Abstract

Cited by 78 (14 self)
 Add to MetaCart
(Show Context)
In this paper, we introduce pilco, a practical, dataefficient modelbased policy search method. Pilco reduces model bias, one of the key problems of modelbased reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into longterm planning, pilco can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using stateoftheart approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and highdimensional control tasks. 1. Introduction and Related
A generalized path integral control approach to reinforcement learning
 The Journal of Machine Learning Research
, 2010
"... With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. ..."
Abstract

Cited by 68 (13 self)
 Add to MetaCart
With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic HamiltonJacobiBellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open algorithmic parameters other than the exploration noise. The resulting algorithm can be conceived of as modelbased, semimodelbased, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradientbased policy learning and scalability to highdimensional control problems. Finally, a learning experiment on a simulated 12 degreeoffreedom robot dog illustrates the functionality of our algorithm in a complex robot learning scenario. We believe that Policy Improvement with Path Integrals (PI 2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory rollouts.
Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach
"... Abstract — Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensionsal continuous state ..."
Abstract

Cited by 59 (10 self)
 Add to MetaCart
(Show Context)
Abstract — Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensionsal continuous stateaction spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilites as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradientbased policy learning and scalability to highdimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a realworld scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI 2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics. I.
Robot Motor Skill Coordination with EMbased Reinforcement Learning
"... Abstract — We present an approach allowing a robot to acquire new motor skills by learning the couplings across motor control variables. The demonstrated skill is first encoded in a compact form through a modified version of Dynamic Movement Primitives (DMP) which encapsulates correlation informatio ..."
Abstract

Cited by 51 (8 self)
 Add to MetaCart
(Show Context)
Abstract — We present an approach allowing a robot to acquire new motor skills by learning the couplings across motor control variables. The demonstrated skill is first encoded in a compact form through a modified version of Dynamic Movement Primitives (DMP) which encapsulates correlation information. ExpectationMaximization based Reinforcement Learning is then used to modulate the mixture of dynamical systems initialized from the user’s demonstration. The approach is evaluated on a torquecontrolled 7 DOFs Barrett WAM robotic arm. Two skill learning experiments are conducted: a reaching task where the robot needs to adapt the learned movement to avoid an obstacle, and a dynamic pancakeflipping task. I.
Learning Motor Primitives for Robotics
"... Abstract — The acquisition and selfimprovement of novel motor skills is among the most important problems in robotics. Motor primitives offer one of the most promising frameworks for the application of machine learning techniques in this context. Employing an improved form of the dynamic systems mo ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
(Show Context)
Abstract — The acquisition and selfimprovement of novel motor skills is among the most important problems in robotics. Motor primitives offer one of the most promising frameworks for the application of machine learning techniques in this context. Employing an improved form of the dynamic systems motor primitives originally introduced by Ijspeert et al. [2], we show how both discrete and rhythmic tasks can be learned using a concerted approach of both imitation and reinforcement learning. For doing so, we present both learning algorithms and representations targeted for the practical application in robotics. Furthermore, we show that it is possible to include a startup phase in rhythmic primitives. We show that two new motor skills, i.e., BallinaCup and BallPaddling, can be learned on a real Barrett WAM robot arm at a pace similar to human learning while achieving a significantly more reliable final performance. I.
Natural Evolution Strategies
"... Abstract — This paper presents Natural Evolution Strategies (NES), a novel algorithm for performing realvalued ‘black box ’ function optimization: optimizing an unknown objective function where algorithmselected function measurements constitute the only information accessible to the method. Natura ..."
Abstract

Cited by 42 (23 self)
 Add to MetaCart
(Show Context)
Abstract — This paper presents Natural Evolution Strategies (NES), a novel algorithm for performing realvalued ‘black box ’ function optimization: optimizing an unknown objective function where algorithmselected function measurements constitute the only information accessible to the method. Natural Evolution Strategies search the fitness landscape using a multivariate normal distribution with a selfadapting mutation matrix to generate correlated mutations in promising regions. NES shares this property with Covariance Matrix Adaption (CMA), an Evolution Strategy (ES) which has been shown to perform well on a variety of highprecision optimization tasks. The Natural Evolution Strategies algorithm, however, is simpler, less adhoc and more principled. Selfadaptation of the mutation matrix is derived using a Monte Carlo estimate of the natural gradient towards better expected fitness. By following the natural gradient instead of the ‘vanilla ’ gradient, we can ensure efficient update steps while preventing early convergence due to overly greedy updates, resulting in reduced sensitivity to local suboptima. We show NES has competitive performance with CMA on unimodal tasks, while outperforming it on several multimodal tasks that are rich in deceptive local optima. I.
Active policy learning for robot planning and exploration under uncertainty
 IN PROCEEDINGS OF ROBOTICS: SCIENCE AND SYSTEMS
, 2007
"... This paper proposes a simulationbased active policy learning algorithm for finitehorizon, partiallyobserved sequential decision processes. The algorithm is tested in the domain of robot navigation and exploration under uncertainty. In such a setting, the expected cost, that must be minimized, is ..."
Abstract

Cited by 39 (5 self)
 Add to MetaCart
(Show Context)
This paper proposes a simulationbased active policy learning algorithm for finitehorizon, partiallyobserved sequential decision processes. The algorithm is tested in the domain of robot navigation and exploration under uncertainty. In such a setting, the expected cost, that must be minimized, is a function of the belief state (filtering distribution). This filtering distribution is in turn nonlinear and subject to discontinuities, which arise because constraints in the robot motion and control models. As a result, the expected cost is nondifferentiable and very expensive to simulate. The new algorithm overcomes the first difficulty and reduces the number of required simulations as follows. First, it assumes that we have carried out previous simulations which returned values of the expected cost for different corresponding policy parameters. Second, it fits a Gaussian process (GP) regression model to these values, so as to approximate the expected cost as a function of the policy parameters. Third, it uses the GP predicted mean and variance to construct a statistical measure that determines which policy parameters should be used in the next simulation. The process is then repeated using the new parameters and the newly gathered expected cost observation. Since the objective is to find the policy parameters that minimize the expected cost, this iterative active learning approach effectively tradesoff between exploration (in regions where the GP variance is large) and exploitation (where the GP mean is low). In our experiments, a robot uses the proposed algorithm to plan an optimal path for accomplishing a series of tasks, while maximizing the information about its pose and map estimates. These estimates are obtained with a standard filter for simultaneous localization and mapping. Upon gathering new observations, the robot updates the state estimates and is able to replan a new path in the spirit of openloop feedback control.