Results 1  10
of
19
Reinforcement Learning with Sequences of Motion Primitives for Robust Manipulation
"... This is a preprint from 16.07.2012, and differs from the final published version on IEEE Xplore. 15523098/$31.00 c○2012 IEEE2 ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
This is a preprint from 16.07.2012, and differs from the final published version on IEEE Xplore. 15523098/$31.00 c○2012 IEEE2
Relative Entropy and Free Energy Dualities: Connections to Path Integral and KL control
"... Abstract — This paper integrates recent work on Path Integral ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract — This paper integrates recent work on Path Integral
TendonDriven Control of Biomechanical and Robotic Systems: A Path Integral Reinforcement Learning Approach.
"... Abstract — We apply path integral reinforcement learning to a biomechanically accurate dynamics model of the index finger and then to the Anatomically Correct Testbed (ACT) robotic hand. We illustrate the applicability of Policy Improvement with Path Integrals (P I 2) to parameterized and nonparame ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract — We apply path integral reinforcement learning to a biomechanically accurate dynamics model of the index finger and then to the Anatomically Correct Testbed (ACT) robotic hand. We illustrate the applicability of Policy Improvement with Path Integrals (P I 2) to parameterized and nonparameterized control policies. This method is based on sampling variations in control, executing them in the real world, and minimizing a cost function on the resulting performance. Iteratively improving the control policy based on realworld performance requires no direct modeling of tendon network nonlinearities and contact transitions, allowing improved task performance. I.
Linear hamilton jacobi bellman equations in high dimensions
 in Conference on Decision and Control (CDC), 2014, arXiv preprint arXiv:1404.1089
"... provides the globally optimal solution to large classes of control problems. Unfortunately, this generality comes at a price, the calculation of such solutions is typically intractible for systems with more than moderate state space size due to the curse of dimensionality. This work combines recent ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
provides the globally optimal solution to large classes of control problems. Unfortunately, this generality comes at a price, the calculation of such solutions is typically intractible for systems with more than moderate state space size due to the curse of dimensionality. This work combines recent results in the structure of the HJB, and its reduction to a linear Partial Differential Equation (PDE), with methods based on low rank tensor representations, known as a separated representations, to address the curse of dimensionality. The result is an algorithm to solve optimal control problems which scales linearly with the number of states in a system, and is applicable to systems that are nonlinear with stochastic forcing in finitehorizon, average cost, and firstexit settings. The method is demonstrated on inverted pendulum, VTOL aircraft, and quadcopter models, with system dimension two, six, and twelve respectively. I.
Learning Control in Robotics  TrajectoryBased Optimal Control Techniques
, 2010
"... In a not too distant future, robots will be a natural part of daily life in human society, providing assistance in many areas ranging from clinical applications, education and care giving, to normal household environments [1]. It is hard to imagine that all possible tasks can be preprogrammed in suc ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In a not too distant future, robots will be a natural part of daily life in human society, providing assistance in many areas ranging from clinical applications, education and care giving, to normal household environments [1]. It is hard to imagine that all possible tasks can be preprogrammed in such robots. Robots need to be able to learn, either by themselves or with the help of human supervision. Additionally, wear and tear on robots in daily use needs to be automatically compensated for, which requires a form of continuous selfcalibration, another form of learning. Finally, robots need to react to stochastic and dynamic environments, i.e., they need to learn how to optimally adapt to uncertainty and unforeseen changes. Robot learning is going to be a key ingredient for the future of autonomous robots. While robot learning covers a rather large field, from learning
Semidefinite Relaxations for Stochastic Optimal Control Policies. arXiv.org
, 2014
"... Abstract — Recent results in the study of the Hamilton Jacobi Bellman (HJB) equation have led to the discovery of a formulation of the value function as a linear Partial Differential Equation (PDE) for stochastic nonlinear systems with a mild constraint on their disturbances. This has yielded promis ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract — Recent results in the study of the Hamilton Jacobi Bellman (HJB) equation have led to the discovery of a formulation of the value function as a linear Partial Differential Equation (PDE) for stochastic nonlinear systems with a mild constraint on their disturbances. This has yielded promising directions for research in the planning and control of nonlinear systems. This work proposes a new method obtaining approximate solutions to these linear stochastic optimal control (SOC) problems. A candidate polynomial with variable coefficients is proposed as the solution to the SOC problem. A Sum of Squares (SOS) relaxation is then taken to the partial differential constraints, leading to a hierarchy of semidefinite relaxations with improving suboptimality gap. The resulting approximate solutions are shown to be guaranteed over and underapproximations for the optimal value function. I.
Tendondriven variable impedance control using reinforcement learning
, 2012
"... Abstract—Biological motor control is capable of learning complex movements containing contact transitions and unknown force requirements while adapting the impedance of the system. In this work, we seek to achieve robotic mimicry of this compliance, employing stiffness only when it is necessary for ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Biological motor control is capable of learning complex movements containing contact transitions and unknown force requirements while adapting the impedance of the system. In this work, we seek to achieve robotic mimicry of this compliance, employing stiffness only when it is necessary for task completion. We use path integral reinforcement learning which has been successfully applied on torquedriven systems to learn episodic tasks without using explicit models. Applying this method to tendondriven systems is challenging because of the increase in dimensionality, the intrinsic nonlinearities of such systems, and the increased effect of external dynamics on the lighter tendondriven end effectors. We demonstrate the simultaneous learning of feedback gains and desired tendon trajectories in a dynamically complex slidingswitch task with a tendondriven robotic hand. The learned controls look noisy but nonetheless result in smooth and expert task performance. We show discovery of dynamic strategies not explored in a demonstration, and that the learned strategy is useful for understanding difficulttomodel plant characteristics. I.
Stochastic Optimal Control for Nonlinear Markov Jump Diffusion Processes
"... Abstract — We consider the problem finite horizon stochastic optimal control for nonlinear markov jump diffusion processes. In particular, by using stochastic calculus for markov jump diffusions processes and the logarithmic transformation of the value function we demonstrate the transformation of t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract — We consider the problem finite horizon stochastic optimal control for nonlinear markov jump diffusion processes. In particular, by using stochastic calculus for markov jump diffusions processes and the logarithmic transformation of the value function we demonstrate the transformation of the corresponding HamiltonJacobiBellman (HJB) Partial Differential Equation (PDE) to the backward Chapman Kolmogorov PDE for jump diffusions. Furthermore we derive the FeynmanKac lemma for nonlinear markov jump diffusions processes and apply it to the transformed HJB equation. Application of the FeynmanKac lemma yields the solution of the transformed HJB equation. The path integral interpretation is derived. Finally, conclusions and future directions are discussed. I.
From information theoretic dualities to Path Integral and Kullback Leibler control: Continuous and Discrete Time formulations
"... Abstract — This paper presents a unified view of stochastic optimal control theory as developed within the machine learning and control theory communities. In particular we show the mathematical connection between recent work on Path Integral (PI) and Kullback Leibler (KL) divergence stochastic opti ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract — This paper presents a unified view of stochastic optimal control theory as developed within the machine learning and control theory communities. In particular we show the mathematical connection between recent work on Path Integral (PI) and Kullback Leibler (KL) divergence stochastic optimal control theory with earlier work on risk sensitivity and the fundamental dualities between free energy and relative entropy. We discuss the applications of the relationship between free energy and relative entropy to nonlinear stochastic dynamical systems affine in noise and nonlinear stochastic dynamics affine in control and noise. For this last class of systems, we provide the PI optimal control and its iterative formulation. In addition, we present the connection of PI control derived based on Dynamic Programming with the information theoretic dualities. Finally, we provide links to KL stochastic optimal control and discuss generalizations and future work. I.
Path Integral Formulation of Stochastic Optimal Control with Generalized Costs?
"... Abstract: Path integral control solves a class of stochastic optimal control problems with a Monte Carlo (MC) method for an associated HamiltonJacobiBellman (HJB) equation. The MC approach avoids the need for a global grid of the domain of the HJB equation and, therefore, path integral control is ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract: Path integral control solves a class of stochastic optimal control problems with a Monte Carlo (MC) method for an associated HamiltonJacobiBellman (HJB) equation. The MC approach avoids the need for a global grid of the domain of the HJB equation and, therefore, path integral control is in principle applicable to control problems of moderate to large dimension. The class of problems path integral control can solve, however, is defined by requirements on the cost function, the noise covariance matrix and the control input matrix. We relax the requirements on the cost function by introducing a new state that represents an augmented running cost. In our new formulation the cost function can contain stochastic integral terms and linear control costs, which are important in applications in engineering, economics and finance. We find an efficient numerical implementation of our gridfree MC approach and demonstrate its performance and usefulness in examples from hierarchical electric load management. The dimension of one of our examples is large enough to make classical gridbased HJB solvers impractical. 1.