Results 1  10
of
205
A Survey of Robot Learning from Demonstration
"... We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a ..."
Abstract

Cited by 274 (19 self)
 Add to MetaCart
(Show Context)
We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a structure in which to categorize LfD research. Specifically, we analyze and categorize the multiple ways in which examples are gathered, ranging from teleoperation to imitation, as well as the various techniques for policy derivation, including matching functions, dynamics models and plans. To conclude we discuss LfD limitations and related promising areas for future research.
Movement Imitation with Nonlinear Dynamical Systems in Humanoid Robots
 In IEEE International Conference on Robotics and Automation (ICRA2002
, 2002
"... This article presents a new approach to movement planning, online trajectory modification, and imitation learning by representing movement plans based on a set of nonlinear di#erential equations with welldefined attractor dynamics. In contrast to nonautonomous movement representations like spline ..."
Abstract

Cited by 201 (27 self)
 Add to MetaCart
(Show Context)
This article presents a new approach to movement planning, online trajectory modification, and imitation learning by representing movement plans based on a set of nonlinear di#erential equations with welldefined attractor dynamics. In contrast to nonautonomous movement representations like splines, the resultant movement plan remains an autonomous set of nonlinear di#erential equations that forms a control policy (CP) which is robust to strong external perturbations and that can be modified online by additional perceptual variables. The attractor landscape of the control policy can be learned rapidly with a locally weighted regression technique with guaranteed convergence of the learning algorithm and convergence to the movement target. This property makes the system suitable for movement imitation and also for classifying demonstrated movement according to the parameters of the learning system.
Learning attractor landscapes for learning motor primitives
 in Advances in Neural Information Processing Systems
, 2003
"... Many control problems take place in continuous stateaction spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control p ..."
Abstract

Cited by 193 (28 self)
 Add to MetaCart
(Show Context)
Many control problems take place in continuous stateaction spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control policies from scratch, its applicability to higher dimensional continuous stateaction spaces remains rather limited to date. Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. For this purpose, we represent canonical policies in terms of differential equations with welldefined attractor properties. By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. We demonstrate our techniques in the context of learning a set of movement skills for a humanoid robot from demonstrations of a human teacher. Policies are acquired rapidly, and, due to the properties of well formulated differential equations, can be reused and modified online under dynamic changes of the environment. The linear parameterization of nonparametric regression moreover lends itself to recognize and classify previously learned movement skills. Evaluations in simulations and on an actual 30 degreeoffreedom humanoid robot exemplify the feasibility and robustness of our approach. 1
Incremental Online Learning in High Dimensions
 Neural Computation
, 2005
"... Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e ..."
Abstract

Cited by 162 (18 self)
 Add to MetaCart
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e#cient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic leaveoneout cross validation for learning without the need to memorize training data, iii) adjusts its weighting kernels based only on local information in order to minimize the danger of negative interference of incremental learning, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of  possibly redundant  inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and e#ciently operate in very high dimensional spaces.
Reinforcement learning for humanoid robotics
 Autonomous Robot
, 2003
"... Abstract. The complexity of the kinematic and dynamic structure of humanoid robots make conventional analytical approaches to control increasingly unsuitable for such systems. Learning techniques offer a possible way to aid controller design if insufficient analytical knowledge is available, and lea ..."
Abstract

Cited by 133 (21 self)
 Add to MetaCart
Abstract. The complexity of the kinematic and dynamic structure of humanoid robots make conventional analytical approaches to control increasingly unsuitable for such systems. Learning techniques offer a possible way to aid controller design if insufficient analytical knowledge is available, and learning approaches seem mandatory when humanoid systems are supposed to become completely autonomous. While recent research in neural networks and statistical learning has focused mostly on learning from finite data sets without stringent constraints on computational efficiency, learning for humanoid robots requires a different setting, characterized by the need for realtime learning performance from an essentially infinite stream of incrementally arriving data. This paper demonstrates how even highdimensional learning problems of this kind can successfully be dealt with by techniques from nonparametric regression and locally weighted learning. As an example, we describe the application of one of the most advanced of such algorithms, Locally Weighted Projection Regression (LWPR), to the online learning of three problems in humanoid motor control: the learning of inverse dynamics models for modelbased control, the learning of inverse kinematics of redundant manipulators, and the learning of oculomotor reflexes. All these examples demonstrate fast, i.e., within seconds or minutes, learning convergence with highly accurate final peformance. We conclude that realtime learning for complex motor system like humanoid robots is possible with appropriately tailored algorithms, such that increasingly autonomous robots with massive learning abilities should be achievable in the near future. 1.
Learning from demonstration and adaptation of biped locomotion
 Robotics and Autonomous Systems
, 2004
"... Abstract — In this paper, we report on our research for learning biped locomotion from human demonstration. Our ultimate goal is to establish a design principle of a controller in order to achieve natural humanlike locomotion. We suggest dynamical movement primitives as a CPG of a biped robot, an a ..."
Abstract

Cited by 127 (9 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, we report on our research for learning biped locomotion from human demonstration. Our ultimate goal is to establish a design principle of a controller in order to achieve natural humanlike locomotion. We suggest dynamical movement primitives as a CPG of a biped robot, an approach we have previously proposed for learning and encoding complex human movements. Demonstrated trajectories are learned through the movement primitives by locally weighted regression, and the frequency of the learned trajectories is adjusted automatically by a novel frequency adaptation algorithm based on phase resetting and entrainment of oscillators. Numerical simulations demonstrate the effectiveness of the proposed locomotion controller. I.
Learning inverse kinematics
 in Proc. IROS, 2001
"... Realtime control of the endeffector of a humanoid robot in external coordinates requires computationally efficient solutions of the inverse kinematics problem. In this context, this paper investigates inverse kinematics learningfor resolved motion rate control (RMRC) employingan optimization criter ..."
Abstract

Cited by 106 (13 self)
 Add to MetaCart
(Show Context)
Realtime control of the endeffector of a humanoid robot in external coordinates requires computationally efficient solutions of the inverse kinematics problem. In this context, this paper investigates inverse kinematics learningfor resolved motion rate control (RMRC) employingan optimization criterion to resolve kinematic redundancies. Our learningapproach is based on the key observations that learningan inverse of a non uniquely invertible function can be accomplished by augmenting the input representation to the inverse model and by usinga spatially localized learningapproach. We apply this strategy to inverse kinematics learningand demonstrate how a recently developed statistical learning algorithm, Locally Weighted Projection Regression, allows efficient learning of inverse kinematic mappings in an incremental fashion even when input spaces become rather high dimensional. The resultingperformance of the inverse kinematics is comparable to Liegeois ’ [9] analytical pseudoinverse with optimization. Our results are illustrated with a 30 degree of freedom humanoid robot. 1
Locally Weighted Projection Regression: An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space
 in Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000
"... Locally weighted projection regression is a new algorithm that achieves nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it uses locally linear models, spanned by a small number of univariate regressions in selected direct ..."
Abstract

Cited by 101 (17 self)
 Add to MetaCart
Locally weighted projection regression is a new algorithm that achieves nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it uses locally linear models, spanned by a small number of univariate regressions in selected directions in input space. This paper evaluates different methods of projection regression and derives a nonlinear function approximator based on them. This nonparametric local learning system i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic cross validation to learn iii) adjusts its weighting kernels based on local information only, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of  possibly redundant  inputs, as shown in evaluations with up to 50 dimensional data sets. To our knowledge, this is the first truly incremental spatially localized l...
Online EM Algorithm for the Normalized Gaussian Network
, 1999
"... A Normalized Gaussian Network (NGnet) (Moody and Darken 1989) is a network of local linear regression units. The model softly partitions the input space by normalized Gaussian functions and each local unit linearly approximates the output within the partition. In this article, we propose a new on ..."
Abstract

Cited by 89 (6 self)
 Add to MetaCart
A Normalized Gaussian Network (NGnet) (Moody and Darken 1989) is a network of local linear regression units. The model softly partitions the input space by normalized Gaussian functions and each local unit linearly approximates the output within the partition. In this article, we propose a new online EM algorithm for the NGnet, which is derived from the batch EM algorithm (Xu, Jordan and Hinton 1995) by introducing a discount factor. We show that the online EM algorithm is equivalent to the batch EM algorithm if a specific scheduling of the discount factor is employed. In addition, we show that the online EM algorithm can be considered as a stochastic approximation method to find the maximum likelihood estimator. A new regularization method is proposed in order to deal with a singular input distribution. In order to manage dynamic environments, where the inputoutput distribution of data changes over time, unit manipulation mechanisms such as unit production, unit deletion...
Trajectory Formation for Imitation with Nonlinear Dynamical Systems
, 2001
"... This article e xplore s ane approach to le rning by imitation and traje5 ory formation byre reC ting move  me ts as mixture s of nonline r di#e e tialeC tions with we llde fine d attractor dynamics. An obseC e move me nt is approximate by finding a be5 fit of the mixture mode to its data by ar ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
This article e xplore s ane approach to le rning by imitation and traje5 ory formation byre reC ting move  me ts as mixture s of nonline r di#e e tialeC tions with we llde fine d attractor dynamics. An obseC e move me nt is approximate by finding a be5 fit of the mixture mode to its data by areCflk e le5R square reC6:k2 te hnique In contrast to nonautonomous move me t re pr e se tationslike spline7 the re sultant moveC t plan r e mains an autonomous se of nonlineC di#eCG tial ek ations that forms a control policy which is robust to strong ek e rnal pe rturbations and that can be modifie by additional pe rce tual variable s. This move me nt policy r e mains the same for a give targe5 r e ardlefl of the initial conditions, and canek5 ly be reR se for ne w targe s. We e aluate the traje5 ory formation syste (TFS) in the conte xt of a humanoid robot simulation that is part of the Virtual Traine r (VT) proje5 , which aims at supe rvising reR bilitatione xe cise in stroke:G tie ts. A typical re habilitatione xe cise was colle6Gfl with a Sarcos SeC suit, ade:C: to re5 rd joint angular move me t from human subje7C7 and approximate and reC5 duce with our imitation te hniqueC Our re sults deC nstrate that multijoint human move me ts can be e56 de succeGk2CC6 , and that thissyste allows robust modifications of the move  me nt policy through eke rnal variable s.