Results 1 - 10
of
70
A Survey of Robot Learning from Demonstration
"... We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a ..."
Abstract
-
Cited by 281 (19 self)
- Add to MetaCart
(Show Context)
We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a structure in which to categorize LfD research. Specifically, we analyze and categorize the multiple ways in which examples are gathered, ranging from teleoperation to imitation, as well as the various techniques for policy derivation, including matching functions, dynamics models and plans. To conclude we discuss LfD limitations and related promising areas for future research.
Reinforcement Learning in Robotics: A Survey
"... Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between di ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between modelbased and model-free as well as between value function-based and policy search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and
Learning Robot Motion Control with Demonstration and Advice-Operators
"... Abstract — As robots become more commonplace within society, the need for tools to enable non-robotics-experts to develop control algorithms, or policies, will increase. Learning from Demonstration (LfD) offers one promising approach, where the robot learns a policy from teacher task executions. Our ..."
Abstract
-
Cited by 33 (15 self)
- Add to MetaCart
(Show Context)
Abstract — As robots become more commonplace within society, the need for tools to enable non-robotics-experts to develop control algorithms, or policies, will increase. Learning from Demonstration (LfD) offers one promising approach, where the robot learns a policy from teacher task executions. Our interests lie with robot motion control policies which map world observations to continuous low-level actions. In this work, we introduce Advice-Operator Policy Improvement (A-OPI) as a novel approach for improving policies within LfD. Two distinguishing characteristics of the A-OPI algorithm are data source and continuous state-action space. Within LfD, more example data can improve a policy. In A-OPI, new data is synthesized from a student execution and teacher advice. By contrast, typical demonstration approaches provide the learner with exclusively teacher executions. A-OPI is effective within continuous state-action spaces because high level human advice is translated into continuous-valued corrections on the student execution. This work presents a first implementation of the A-OPI algorithm, validated on a Segway RMP robot performing a spatial positioning task. A-OPI is found to improve task performance, both in success and accuracy. Furthermore, performance is shown to be similar or superior to the typical exclusively teacher demonstrations approach. I.
Learning by demonstration with critique from a human teacher. HRI
, 2007
"... Learning by demonstration can be a powerful and natural tool for developing robot control policies. That is, instead of tedious hand-coding, a robot may learn a control policy by interacting with a teacher. In this work we present an algorithm for learning by demonstration in which the teacher opera ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
(Show Context)
Learning by demonstration can be a powerful and natural tool for developing robot control policies. That is, instead of tedious hand-coding, a robot may learn a control policy by interacting with a teacher. In this work we present an algorithm for learning by demonstration in which the teacher operates in two phases. The teacher first demonstrates the task to the learner. The teacher next critiques learner performance of the task. This critique is used by the learner to update its control policy. In our implementation we utilize a 1-Nearest Neighbor technique which incorporates both training dataset and teacher critique. Since the teacher critiques performance only, they do not need to guess at an effective critique for the underlying algorithm. We argue that this method is particularly well-suited to human teachers, who are generally better at assigning credit to performances than to algorithms. We have applied this algorithm to the simulated task of a robot intercepting a ball. Our results demonstrate improved performance with teacher critiquing, where performance is measured by both execution success and efficiency.
Knowledge transfer using local features
- in: Proceedings of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL’07
, 2007
"... Abstract-We present a method for reducing the effort required to compute policies for tasks based on solutions to previously solved tasks. The key idea is to use a learned intermediate policy based on local features to create an initial policy for the new task. In order to further improve this init ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
(Show Context)
Abstract-We present a method for reducing the effort required to compute policies for tasks based on solutions to previously solved tasks. The key idea is to use a learned intermediate policy based on local features to create an initial policy for the new task. In order to further improve this initial policy, we developed a form of generalized policy iteration. We achieve a substantial reduction in computation needed to find policies when previous experience is available.
Learning similar tasks from observation and practice
- In International Conference on Intelligent Robots and Systems
, 2006
"... Abstract- This paper presents a case study of learning to select behavioral primitives and generate subgoals from observation and practice. Our approach uses local features to generalize across tasks and global features to learn from practice. We demonstrate this approach applied to the marble maze ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
Abstract- This paper presents a case study of learning to select behavioral primitives and generate subgoals from observation and practice. Our approach uses local features to generalize across tasks and global features to learn from practice. We demonstrate this approach applied to the marble maze task. Our robot uses local features to initially learn primitive selection and subgoal generation policies from observing a teacher maneuver a marble through a maze. The robot then uses this information as it tries to traverse another maze, and refines the information during learning from practice. I.
Tactile guidance for policy refinement and reuse
- Sharif University of Technology
, 1998
"... Abstract—Demonstration learning is a powerful and practical technique to develop robot behaviors. Even so, development remains a challenge and possible demonstration limitations can degrade policy performance. This work presents an approach for policy improvement and adaptation through a tactile int ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
(Show Context)
Abstract—Demonstration learning is a powerful and practical technique to develop robot behaviors. Even so, development remains a challenge and possible demonstration limitations can degrade policy performance. This work presents an approach for policy improvement and adaptation through a tactile interface located on the body of a robot. We introduce the Tactile Policy Correction (TPC) algorithm, that employs tactile feedback for the refinement of a demonstrated policy, as well as its reuse for the development of other policies. We validate TPC on a humanoid robot performing grasp-positioning tasks. The performance of the demonstrated policy is found to improve with tactile corrections. Tactile guidance also is shown to enable the development of policies able to successfully execute novel, undemonstrated, tasks. body. Through the tactile interface, the human teacher indicates relative adjustments to the robot pose (policy predictions) online, as the robot executes. The robot immediately modifies its pose to accommodate the adjustment, and the resulting, adjusted, pose is treated as new training data for the policy.
M.J.Mataric´, ‘Toward a vocabulary of primitive task programs for humanoid robots
- Proc. of International Conference on Development and Learning (ICDL), Bloomington,IN
, 2006
"... Abstract — Researchers and engineers have used primitive actions to facilitate programming of tasks since the days of Shakey [1]. Task-level programming, which requires the user to specify only subgoals of a task to be accomplished, depends on such a set of primitive task programs to perform these s ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
Abstract — Researchers and engineers have used primitive actions to facilitate programming of tasks since the days of Shakey [1]. Task-level programming, which requires the user to specify only subgoals of a task to be accomplished, depends on such a set of primitive task programs to perform these subgoals. Past research in this area has used the commands from robot programming languages as the vocabulary of primitive tasks for robotic manipulators. We propose drawing from work measurement systems to construct the vocabulary of primitive task programs. We describe one such work measurement system, present several primitive task programs for humanoid robots inspired from this system, and show how these primitive programs can be used to construct complex behaviors. Index Terms — robot programming, task-level programming, humanoid robots I.
Learning from observation and practice using primitives
- In AAAI Fall Symposium Series, Symposium on Real-life Reinforcement Learning, Washington,USA
, 2004
"... We explore how to enable robots to rapidly learn from watching a human or robot perform a task, and from practicing the task itself. A key component of our approach is to use small units of behavior, which we refer to as behavioral primitives. Another key component is to use the observed human behav ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We explore how to enable robots to rapidly learn from watching a human or robot perform a task, and from practicing the task itself. A key component of our approach is to use small units of behavior, which we refer to as behavioral primitives. Another key component is to use the observed human behavior to define the space to be explored during learning from practice. In this paper we manually define task appropriate primitives by programming how to find them in the training data. We describe memory-based approaches to learning how to select and provide subgoals for behavioral primitives. We demonstrate both learning from observation and learning from practice on a marble maze task, Labyrinth. Using behavioral primitives greatly speeds up learning relative to learning using a direct mapping from states to actions.
INTERACTIVE HUMAN POSE AND ACTION RECOGNITION USING DYNAMICAL MOTION PRIMITIVES
, 2007
"... There is currently a division between real-world human performance and the decision making of socially interactive robots. This circumstance is partially due to the difficulty in estimating human cues, such as pose and gesture, from robot sensing. Towards bridging this division, we present a method ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
There is currently a division between real-world human performance and the decision making of socially interactive robots. This circumstance is partially due to the difficulty in estimating human cues, such as pose and gesture, from robot sensing. Towards bridging this division, we present a method for kinematic pose estimation and action recognition from monocular robot vision through the use of dynamical human motion vocabularies. Our notion of a motion vocabulary is comprised of movement primitives that structure a human’s action space for decision making and predict human movement dynamics. Through prediction, such primitives can be used to both generate motor commands for specific actions and perceive humans performing those actions. In this paper, we focus specifically on the perception of human pose and performed actions using a known vocabulary of primitives. Given image observations over time, each primitive infers pose independently using its expected dynamics in the context of a particle filter. Pose estimates from a set of primitives inferencing in parallel are arbitrated to estimate the action being performed. The efficacy of our approach is demonstrated through interactive-time pose and action recognition over extended motion trials. Results evidence our approach requires small numbers of particles for tracking, is robust to unsegmented multi-action movement, movement speed, camera viewpoint and is able to recover from occlusions.