Results 1  10
of
98
A Survey of Robot Learning from Demonstration
"... We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a ..."
Abstract

Cited by 274 (19 self)
 Add to MetaCart
We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a structure in which to categorize LfD research. Specifically, we analyze and categorize the multiple ways in which examples are gathered, ranging from teleoperation to imitation, as well as the various techniques for policy derivation, including matching functions, dynamics models and plans. To conclude we discuss LfD limitations and related promising areas for future research.
An application of reinforcement learning to aerobatic helicopter flight
 In Advances in Neural Information Processing Systems 19
, 2007
"... Autonomous helicopter flight is widely regarded to be a highly challenging control problem. This paper presents the first successful autonomous completion on a real RC helicopter of the following four aerobatic maneuvers: forward flip and sideways roll at low speed, tailin funnel, and nosein funne ..."
Abstract

Cited by 126 (10 self)
 Add to MetaCart
(Show Context)
Autonomous helicopter flight is widely regarded to be a highly challenging control problem. This paper presents the first successful autonomous completion on a real RC helicopter of the following four aerobatic maneuvers: forward flip and sideways roll at low speed, tailin funnel, and nosein funnel. Our experimental results significantly extend the state of the art in autonomous helicopter flight. We used the following approach: First we had a pilot fly the helicopter to help us find a helicopter dynamics model and a reward (cost) function. Then we used a reinforcement learning (optimal control) algorithm to find a controller that is optimized for the resulting model and reward function. More specifically, we used differential dynamic programming (DDP), an extension of the linear quadratic regulator (LQR). 1
Knows What It Knows: A Framework For SelfAware Learning
"... We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is ..."
Abstract

Cited by 67 (20 self)
 Add to MetaCart
We introduce a learning framework that combines elements of the wellknown PAC and mistakebound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcementlearning and activelearning problems. We catalog several KWIKlearnable classes and open problems. 1.
Autonomous helicopter aerobatics through apprenticeship learning
 International Journal of Robotics Research
"... Autonomous helicopter flight is widely regarded to be a highly challenging control problem. Despite this fact, human experts can reliably fly helicopters through a wide range of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s capabilities. We present apprenticeship learning ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
(Show Context)
Autonomous helicopter flight is widely regarded to be a highly challenging control problem. Despite this fact, human experts can reliably fly helicopters through a wide range of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s capabilities. We present apprenticeship learning algorithms, which leverage expert demonstrations to efficiently learn good controllers for tasks being demonstrated by an expert. These apprenticeship learning algorithms have enabled us to significantly extend the state of the art in autonomous helicopter aerobatics. Our experimental results include the first autonomous execution of a wide range of maneuvers, including but not limited to inplace flips, inplace rolls, loops and hurricanes, and even autorotation landings, chaos and tictocs, which only exceptional human pilots can perform. Our results also include complete airshows, which require autonomous transitions between many of these maneuvers. Our controllers perform as well as, and often even better than, our expert pilot.
A gametheoretic approach to apprenticeship learning
 In Advances in Neural Information Processing Systems 22
, 2008
"... We study the problem of an apprentice learning to behave in an environment with an unknown reward function by observing the behavior of an expert. We follow on the work of Abbeel and Ng [1] who considered a framework in which the true reward function is assumed to be a linear combination of a set of ..."
Abstract

Cited by 61 (3 self)
 Add to MetaCart
(Show Context)
We study the problem of an apprentice learning to behave in an environment with an unknown reward function by observing the behavior of an expert. We follow on the work of Abbeel and Ng [1] who considered a framework in which the true reward function is assumed to be a linear combination of a set of known and observable features. We give a new algorithm that, like theirs, is guaranteed to learn a policy that is nearly as good as the expert’s, given enough examples. However, unlike their algorithm, we show that ours may produce a policy that is substantially better than the expert’s. Moreover, our algorithm is computationally faster, is easier to implement, and can be applied even in the absence of an expert. The method is based on a gametheoretic view of the problem, which leads naturally to a direct application of the multiplicativeweights algorithm of Freund and Schapire [2] for playing repeated matrix games. In addition to our formal presentation and analysis of the new algorithm, we sketch how the method can be applied when the transition function itself is unknown, and we provide an experimental demonstration of the algorithm on a toy videogame environment. 1
Online linear regression and its application to modelbased reinforcement learning
 In Advances in Neural Information Processing Systems 20 (NIPS07
, 2007
"... We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a modelbased approach and show that a special type of online linear regression allows us to learn MDPs with (possibly kernaliz ..."
Abstract

Cited by 47 (9 self)
 Add to MetaCart
We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a modelbased approach and show that a special type of online linear regression allows us to learn MDPs with (possibly kernalized) linearly parameterized dynamics. This result builds on Kearns and Singh’s work that provides a provably efficient algorithm for finite state MDPs. Our approach is not restricted to the linear setting, and is applicable to other classes of continuous MDPs.
Dynamic imitation in a humanoid robot through nonparametric probabilistic inference
 In Proceedings of Robotics: Science and Systems (RSS’06
, 2006
"... Abstract — We tackle the problem of learning imitative wholebody motions in a humanoid robot using probabilistic inference in Bayesian networks. Our inferencebased approach affords a straightforward method to exploit rich yet uncertain prior information obtained from human motion capture data. Dyna ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
(Show Context)
Abstract — We tackle the problem of learning imitative wholebody motions in a humanoid robot using probabilistic inference in Bayesian networks. Our inferencebased approach affords a straightforward method to exploit rich yet uncertain prior information obtained from human motion capture data. Dynamic imitation implies that the robot must interact with its environment and account for forces such as gravity and inertia during imitation. Rather than explicitly modeling these forces and the body of the humanoid as in traditional approaches, we show that stable imitative motion can be achieved by learning a sensorbased representation of dynamic balance. Bayesian networks provide a sound theoretical framework for combining prior kinematic information (from observing a human demonstrator) with prior dynamic information (based on previous experience) to model and subsequently infer motions which, with high probability, will be dynamically stable. By posing the problem as one of inference in a Bayesian network, we show that methods developed for approximate inference can be leveraged to efficiently perform inference of actions. Additionally, by using nonparametric inference and a nonparametric (Gaussian process) forward model, our approach does not make any strong assumptions about the physical environment or the mass and inertial properties of the humanoid robot. We propose an iterative, probabilistically constrained algorithm for exploring the space of motor commands and show that the algorithm can quickly discover dynamically stable actions for wholebody imitation of human motion. Experimental results based on simulation and subsequent execution by a HOAP2 humanoid robot demonstrate that our algorithm is able to imitate a human performing actions such as squatting and a onelegged balance. I.
Learning Robot Motion Control with Demonstration and AdviceOperators
"... Abstract — As robots become more commonplace within society, the need for tools to enable nonroboticsexperts to develop control algorithms, or policies, will increase. Learning from Demonstration (LfD) offers one promising approach, where the robot learns a policy from teacher task executions. Our ..."
Abstract

Cited by 33 (15 self)
 Add to MetaCart
(Show Context)
Abstract — As robots become more commonplace within society, the need for tools to enable nonroboticsexperts to develop control algorithms, or policies, will increase. Learning from Demonstration (LfD) offers one promising approach, where the robot learns a policy from teacher task executions. Our interests lie with robot motion control policies which map world observations to continuous lowlevel actions. In this work, we introduce AdviceOperator Policy Improvement (AOPI) as a novel approach for improving policies within LfD. Two distinguishing characteristics of the AOPI algorithm are data source and continuous stateaction space. Within LfD, more example data can improve a policy. In AOPI, new data is synthesized from a student execution and teacher advice. By contrast, typical demonstration approaches provide the learner with exclusively teacher executions. AOPI is effective within continuous stateaction spaces because high level human advice is translated into continuousvalued corrections on the student execution. This work presents a first implementation of the AOPI algorithm, validated on a Segway RMP robot performing a spatial positioning task. AOPI is found to improve task performance, both in success and accuracy. Furthermore, performance is shown to be similar or superior to the typical exclusively teacher demonstrations approach. I.
A Scalable Method for Solving HighDimensional Continuous POMDPs Using Local Approximation
 Conf. on Uncertainty in Artificial Intelligence
, 2010
"... PartiallyObservable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding beliefMDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
(Show Context)
PartiallyObservable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding beliefMDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent notion of locality, we can find an approximate solution using local optimization methods. We parameterize the belief distribution as a Gaussian mixture, and use the Extended Kalman Filter (EKF) to approximate the belief update. Since the EKF is a firstorder filter, we can marginalize over the observations analytically. By using feedback control and state estimation during policy execution, we recover a behavior that is effectively conditioned on incoming observations despite the unconditioned planning. Local optimization provides no guarantees of global optimality, but it allows us to tackle domains that are at least an order of magnitude larger than the current stateoftheart. We demonstrate the scalability of our algorithm by considering a simulated handeye coordination domain with 16 continuous state dimensions and 6 continuous action dimensions.
Autonomic MultiAgent Management of Power and Performance in Data Centers
"... The rapidly rising cost and environmental impact of energy consumption in data centers has become a multibillion dollar concern globally. In response, the IT Industry is actively engaged in a firsttomarket race to develop energyconserving hardware and software solutions that do not sacrifice perf ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
(Show Context)
The rapidly rising cost and environmental impact of energy consumption in data centers has become a multibillion dollar concern globally. In response, the IT Industry is actively engaged in a firsttomarket race to develop energyconserving hardware and software solutions that do not sacrifice performance objectives. In this work we demonstrate a prototype of an integrated data center power management solution that employs server management tools, appropriate sensors and monitors, and an agentbased approach to achieve specified power and performance objectives. By intelligently turning off servers under lowload conditions, we can achieve over 25 % power savings over the unmanaged case without incurring SLA penalties for typical daily and weekly periodic demands seen in webserver farms. Categories and Subject Descriptors D.4.8 [Software]: Performance—measurements, modeling and prediction, operational analysis General Terms Data center, power measurement, multicriteria utility functions, policybased management