Results 1 - 10
of
84
Reinforcement Learning in Robotics: A Survey
"... Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between di ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between modelbased and model-free as well as between value function-based and policy search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and
Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning
"... Abstract—Over the last years, there has been substantial progress in robust manipulation in unstructured environments. The long-term goal of our work is to get away from precise, but very expensive robotic systems and to develop affordable, potentially imprecise, self-adaptive manipulator systems th ..."
Abstract
-
Cited by 31 (13 self)
- Add to MetaCart
(Show Context)
Abstract—Over the last years, there has been substantial progress in robust manipulation in unstructured environments. The long-term goal of our work is to get away from precise, but very expensive robotic systems and to develop affordable, potentially imprecise, self-adaptive manipulator systems that can interactively perform tasks such as playing with children. In this paper, we demonstrate how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful of trials—from scratch. Our manipulator is inaccurate and provides no pose feedback. For learning a controller in the work space of a Kinect-style depth camera, we use a model-based reinforcement learning technique. Our learning method is data efficient, reduces model bias, and deals with several noise sources in a principled way during long-term planning. We present a way of incorporating state-space constraints into the learning process and analyze the learning gain by exploiting the sequential structure of the stacking task. I.
Hierarchical relative entropy policy search
- In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS
, 2012
"... Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an agent’s policy may well be the key to improved scalability and higher per-formance. However, such hierarchical struc-tures cannot be exploited by current policy search algorithms. We will concentrate ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
(Show Context)
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an agent’s policy may well be the key to improved scalability and higher per-formance. However, such hierarchical struc-tures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy — the ‘mixed option ’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy deter-mines the action. In this paper, we reformulate learning a hi-erarchical policy as a latent variable estima-tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solu-tions while also showing an increased perfor-mance in terms of learning speed and quality of the found policy in comparison to the non-hierarchical approach. 1
Data-Efficient Generalization of Robot Skills with Contextual Policy Search
"... In robotics, controllers make the robot solve a task within a specific context. The context can describe the objectives of the robot or physical properties of the environment and is always specified before task execution. To generalize the controller to multiple contexts, we follow a hierarchical ap ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
(Show Context)
In robotics, controllers make the robot solve a task within a specific context. The context can describe the objectives of the robot or physical properties of the environment and is always specified before task execution. To generalize the controller to multiple contexts, we follow a hierarchical approach for policy learning: A lower-level policy controls the robot for a given context and an upper-level policy generalizes among contexts. Current approaches for learning such upper-level policies are based on model-free policy search, which require an excessive number of interactions of the robot with its environment. More data-efficient policy search approaches are model based but, thus far, without the capability of learning hierarchical policies. We propose a new model-based policy search approach that can also learn contextual upper-level policies. Our approach is based on learning probabilistic forward models for long-term predictions. Using these predictions, we use information-theoretic insights to improve the upper-level policy. Our method achieves a substantial improvement in learning speed compared to existing methods on simulated and real robotic tasks.
Variational policy search via trajectory optimization
, 2013
"... In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and high-dimensional tasks present a serious challenge, particularly when co ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
(Show Context)
In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and high-dimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present a method that uses trajectory optimization as a powerful exploration strat-egy that guides the policy search. A variational decomposition of a maximum likelihood policy objective allows us to use standard trajectory optimization al-gorithms such as differential dynamic programming, interleaved with standard supervised learning for the policy itself. We demonstrate that the resulting algo-rithm can outperform prior methods on two challenging locomotion tasks.
RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control
"... Abstract—Reinforcement Learning (RL) is a paradigm for learning decision-making tasks that could enable robots to learn and adapt to their situation on-line. For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-tim ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Reinforcement Learning (RL) is a paradigm for learning decision-making tasks that could enable robots to learn and adapt to their situation on-line. For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. Existing model-based RL methods learn in relatively few samples, but typically take too much time between each action for practical on-line learning. In this paper, we present a novel parallel architecture for model-based RL that runs in real-time by 1) taking advantage of sample-based approximate planning methods and 2) parallelizing the acting, model learning, and planning processes in a novel way such that the acting process is sufficiently fast for typical robot control cycles. We demonstrate thatalgorithmsusingthisarchitectureperformnearlyaswellas methods using the typical sequential architecture when both are given unlimited time, and greatly out-perform these methods on tasks that require real-time actions such as controlling an autonomous vehicle. I.
Learning neural network policies with guided policy search under unknown dynamics
- In Advances in Neural Information Processing Systems
, 2014
"... We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems. These tra-jectory distributions can be used within the framework of guided policy search to learn policies with an arbitrary parameterization. Our ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems. These tra-jectory distributions can be used within the framework of guided policy search to learn policies with an arbitrary parameterization. Our method fits time-varying linear dynamics models to speed up learning, but does not rely on learning a global model, which can be difficult when the dynamics are complex and discontinuous. We show that this hybrid approach requires many fewer samples than model-free methods, and can handle complex, nonsmooth dynamics that can pose a challenge for model-based techniques. We present experiments showing that our method can be used to learn complex neural network policies that successfully execute simulated robotic manipulation tasks in partially observed environments with nu-merous contact discontinuities and underactuation. 1
Learning Complex Neural Network Policies with Trajectory Optimization
"... Direct policy search methods offer the promise of automatically learning controllers for com-plex, high-dimensional tasks. However, prior ap-plications of policy search often required spe-cialized, low-dimensional policy classes, limit-ing their generality. In this work, we introduce a policy search ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
Direct policy search methods offer the promise of automatically learning controllers for com-plex, high-dimensional tasks. However, prior ap-plications of policy search often required spe-cialized, low-dimensional policy classes, limit-ing their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, rep-resented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between opti-mizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods. 1.
Deepmpc: Learning deep latent features for model predictive control
- In RSS
, 2015
"... Abstract—Designing controllers for tasks with complex non-linear dynamics is extremely challenging, time-consuming, and in many cases, infeasible. This difficulty is exacerbated in tasks such as robotic food-cutting, in which dynamics might vary both with environmental properties, such as material a ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Designing controllers for tasks with complex non-linear dynamics is extremely challenging, time-consuming, and in many cases, infeasible. This difficulty is exacerbated in tasks such as robotic food-cutting, in which dynamics might vary both with environmental properties, such as material and tool class, and with time while acting. In this work, we present DeepMPC, an online real-time model-predictive control approach designed to handle such difficult tasks. Rather than hand-design a dynamics model for the task, our approach uses a novel deep architecture and learning algorithm, learning controllers for complex tasks directly from data. We validate our method in experiments on a large-scale dataset of 1488 material cuts for 20 diverse classes, and in 450 real-world robotic experiments, demonstrating significant improvement over several other approaches. I.
Sample-Based Information-Theoretic Stochastic Optimal Control
"... proaches rely on samples to either obtain an estimate of the value function or a linearisation of the underlying system model. However, these approaches typically neglect the fact that the accuracy of the policy update depends on the closeness of the resulting trajectory distribution to these sample ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
proaches rely on samples to either obtain an estimate of the value function or a linearisation of the underlying system model. However, these approaches typically neglect the fact that the accuracy of the policy update depends on the closeness of the resulting trajectory distribution to these samples. The greedy operator does not consider such closeness constraint to the samples. Hence, the greedy operator can lead to oscillations or even instabilities in the policy updates. Such undesired behaviour is likely to result in an inferior performance of the estimated policy. We reuse inspiration from the reinforcement learning community and relax the greedy operator used in SOC with an information theoretic bound that limits the ‘distance ’ of two subsequent trajectory distributions in a policy update. The introduced bound ensures a smooth and stable policy update. Our method is also well suited for model-based reinforcement learning, where we estimate the system dynamics model from data. As this model is likely to be inaccurate, it might be dangerous to exploit the model greedily. Instead, our bound ensures that we generate new data in the vicinity of the current data, such that we can improve our estimate of the system dynamics model. We show that our approach outperforms several state of the art approaches on challenging simulated robot control tasks.