Results 11  20
of
117
Learning Trajectory Preferences for Manipulators via Iterative Improvement
"... We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a coactive online learning framework for teaching robots the preferences of its use ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a coactive online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this coactive preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, while, nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We demonstrate the generalization ability of our algorithm on a variety of tasks, for whom, the preferences were not only influenced by the object being manipulated but also by the surrounding environment. 1.
Learning parameterized skills
 in: Proceedings of International Conference of Machine Learning
, 2012
"... We introduce a method for constructing skills capable of solving tasks drawn from a distribution of parameterized reinforcement learning problems. The method draws example tasks from a distribution of interest and uses the corresponding learned policies to estimate the topology of the lowerdimensio ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
(Show Context)
We introduce a method for constructing skills capable of solving tasks drawn from a distribution of parameterized reinforcement learning problems. The method draws example tasks from a distribution of interest and uses the corresponding learned policies to estimate the topology of the lowerdimensional piecewisesmooth manifold on which the skill policies lie. This manifold models how policy parameters change as task parameters vary. The method identifies the number of charts that compose the manifold and then applies nonlinear regression in each chart to construct a parameterized skill by predicting policy parameters from task parameters. We evaluate our method on an underactuated simulated robotic arm tasked with learning to accurately throw darts at a parameterized target location. 1.
Learning Complex Motions by Sequencing Simpler Motion Templates
 in International Conference on Machine Learning (ICML
, 2009
"... Learning complex motions by sequencing simpler motion templates ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
Learning complex motions by sequencing simpler motion templates
ModelFree Reinforcement Learning as Mixture Learning
"... We cast modelfree reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is eq ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
We cast modelfree reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a nonbootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging results. 1.
A Survey of MultiObjective Sequential DecisionMaking
"... Sequential decisionmaking problems with multiple objectives arise naturally in practice and pose unique challenges for research in decisiontheoretic planning and learning, which has largely focused on singleobjective settings. This article surveys algorithms designed for sequential decisionmakin ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
Sequential decisionmaking problems with multiple objectives arise naturally in practice and pose unique challenges for research in decisiontheoretic planning and learning, which has largely focused on singleobjective settings. This article surveys algorithms designed for sequential decisionmaking problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multiobjective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a singleobjective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multiobjective methods according to the applicable scenario, the nature of the scalarization function (which projects multiobjective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multiobjective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. 1.
Policy Search via the Signed Derivative
"... Abstract — We consider policy search for reinforcement learning: learning policy parameters, for some fixed policy class, that optimize performance of a system. In this paper, we propose a novel policy gradient method based on an approximation we call the Signed Derivative; the approximation is base ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Abstract — We consider policy search for reinforcement learning: learning policy parameters, for some fixed policy class, that optimize performance of a system. In this paper, we propose a novel policy gradient method based on an approximation we call the Signed Derivative; the approximation is based on the intuition that it is often very easy to guess the direction in which control inputs affect future state variables, even if we do not have an accurate model of the system. The resulting algorithm is very simple, requires no model of the environment, and we show that it can outperform standard stochastic estimators of the gradient; indeed we show that Signed Derivative algorithm can in fact perform as well as the true (modelbased) policy gradient, but without knowledge of the model. We evaluate the algorithm’s performance on both a simulated task and two realworld tasks — driving an RC car along a specified trajectory, and jumping onto obstacles with an quadruped robot — and in all cases achieve good performance after very little training. I.
Learning Modelfree Robot Control by a Monte Carlo EM
"... We address the problem of learning robot control by modelfree reinforcement learning (RL). We adopt the probabilistic model of Vlassis and Toussaint (2009) for modelfree RL, and we propose a Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller para ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We address the problem of learning robot control by modelfree reinforcement learning (RL). We adopt the probabilistic model of Vlassis and Toussaint (2009) for modelfree RL, and we propose a Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller parameters using information obtained from randomly generated robot trajectories. MCEM is related to, and generalizes, the PoWER algorithm of Kober and Peters (2009). In the finitehorizon case MCEM reduces precisely to PoWER, but MCEM can also handle the discounted infinitehorizon case. An interesting result is that the infinitehorizon case can be viewed as a ‘randomized’ version of the finitehorizon case, in the sense that the length of each sampled trajectory is a random draw from an appropriately constructed geometric distribution. We provide some preliminary experiments demonstrating the effects of fixed (PoWER) vs randomized (MCEM) horizon length in two simulated and one real robot control tasks. 1 1
Learning table tennis with a mixture of motor primitives
 in Proceedings of the 10th IEEERAS International Conference on Humanoid Robots (Humanoids
"... Abstract — Table tennis is a sufficiently complex motor task for studying complete skill learning systems. It consists of several elementary motions and requires fast movements, accurate control, and online adaptation. To represent the elementary movements needed for robot table tennis, we rely on d ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
(Show Context)
Abstract — Table tennis is a sufficiently complex motor task for studying complete skill learning systems. It consists of several elementary motions and requires fast movements, accurate control, and online adaptation. To represent the elementary movements needed for robot table tennis, we rely on dynamic systems motor primitives (DMP). While such DMPs have been successfully used for learning a variety of simple motor tasks, they only represent single elementary actions. In order to select and generalize among different striking movements, we present a new approach, called Mixture of Motor Primitives that uses a gating network to activate appropriate motor primitives. The resulting policy enables us to select among the appropriate motor primitives as well as to generalize between them. In order to obtain a fully learned robot table tennis setup, we also address the problem of predicting the necessary context information, i.e., the hitting point in time and space where we want to hit the ball. We show that the resulting setup was capable of playing rudimentary table tennis using an anthropomorphic robot arm. I.
Variational methods for Reinforcement Learning
"... We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transition matrix is obtained from which the optimal decision policy is formed. The classical maximum likelihood point estimate ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transition matrix is obtained from which the optimal decision policy is formed. The classical maximum likelihood point estimate of the transition model does not reflect the uncertainty in the estimate of the transition model and the resulting policies may consequently lack a sufficient degree of exploration. We consider a Bayesian alternative that maintains a distribution over the transition so that the resulting policy takes into account the limited experience of the environment. The resulting algorithm is formally intractable and we discuss two approximate solution methods, Variational Bayes and Expectation Propagation.
New inference strategies for solving Markov decision processes using reversible jump MCMC
 IN UAI
, 2009
"... In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higherdimensional spaces. We first ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higherdimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.