Results 1  10
of
39
Reinforcement Learning in Robotics: A Survey
"... Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hardtoengineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between di ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hardtoengineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between modelbased and modelfree as well as between value functionbased and policy search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and
R.: Analysis of a classificationbased policy iteration algorithm
 In: Proceedings of the 27th International Conference on Machine Learning
, 2010
"... Abstract We present a classificationbased policy iteration algorithm, called Direct Policy Iteration, and provide its finitesample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the ..."
Abstract

Cited by 31 (9 self)
 Add to MetaCart
(Show Context)
Abstract We present a classificationbased policy iteration algorithm, called Direct Policy Iteration, and provide its finitesample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policies that are greedy w.r.t. any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classificationbased policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity.
Rollout sampling approximate policy iteration
 Machine Learning
, 2008
"... Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supe ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
(Show Context)
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multiarmed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountaincar. 1
A Reduction from Apprenticeship Learning to Classification
"... We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert. We study a common approach to learning from expert demonstrations: using a classification ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert. We study a common approach to learning from expert demonstrations: using a classification algorithm to learn to imitate the expert’s behavior. Although this straightforward learning strategy is widelyused in practice, it has been subject to very little formal analysis. We prove that, if the learned classifier has error rate ǫ, the difference between the value of the apprentice’s policy and the expert’s policy is O ( √ ǫ). Further, we prove that this difference is onlyO(ǫ) when the expert’s policy is close to optimal. This latter result has an important practical consequence: Not only does imitating a nearoptimal expert result in a better policy, but far fewer demonstrations are required to successfully imitate such an expert. This suggests an opportunity for substantial savings whenever the expert is known to be good, but demonstrations are expensive or difficult to obtain. 1
Machine Learning Techniques—Reductions Between Prediction Quality Metrics
"... Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one los ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one loss function into a problem of minimizing another, simpler loss function. This tutorial discusses how to create robust reductions that perform well in practice. The reductions discussed here can be used to solve any supervised learning problem with a standard binary classification or regression algorithm available in any machine learning toolkit. We also discuss common design flaws in folklore reductions. 1
Multistage stochastic programming: A scenario tree based approach to planning under uncertainty
 APPLICATIONS IN ARTIFICIAL INTELLIGENCE: CONCEPTS AND SOLUTIONS, CHAPTER 6
, 2011
"... In this chapter, we present the multistage stochastic programming framework for sequential decision making under uncertainty and stress its differences with Markov Decision Processes. We describe the main approximation technique used for solving problems formulated in the multistage stochastic progr ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
In this chapter, we present the multistage stochastic programming framework for sequential decision making under uncertainty and stress its differences with Markov Decision Processes. We describe the main approximation technique used for solving problems formulated in the multistage stochastic programming framework, which is based on a discretization of the disturbance space. We explain that one issue of the approach is that the discretization scheme leads in practice to illposed problems, because the complexity of the numerical optimization algorithms used for computing the decisions restricts the number of samples and optimization variables that one can use for approximating expectations, and therefore makes the numerical solutions very sensitive to the parameters of the discretization. As the framework is weak in the absence of efficient tools for evaluating and eventually selecting competing approximate solutions, we show how one can extend it by using machine learning based techniques, so as to yield a sound and generic method to solve approximately a large class of multistage decision problems under uncertainty. The framework and solution techniques presented in the chapter are explained and illustrated on several examples. Along the way, we describe notions from decision theory that are relevant to sequential decision making under uncertainty in general.
Focus of attention in reinforcement learning
 In Daniela Pucci de Farias, Shie Mannor, Doina Precup, and Georgios Theocharous, editors, AAAI04 Workshop on Learning and Planning in Markov Processes: Advances and Challenges
, 2004
"... Abstract: One key topic in reinforcement learning is function approximation which is critical for successfully applying reinforcement learning to domains with large state spaces. Unfortunately, function approximation can lead to several problems including the suboptimality of the produced policies a ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
Abstract: One key topic in reinforcement learning is function approximation which is critical for successfully applying reinforcement learning to domains with large state spaces. Unfortunately, function approximation can lead to several problems including the suboptimality of the produced policies and even divergence of learning. Thus, reinforcement learning with function approximation has remained an area of active research. In this paper, we demonstrate that in reinforcement learning, it is helpful for the agent to focus on more important states thereby producing better policies using less computing resources. In particular, the problem of focused learning is investigated formally, and a classificationbased reinforcement learning method is considered. We will first define a formal metric of state importance, and then utilize it in reinforcement learning with function approximation. The advantages of focusing attention on important states are supported both theoretically and empirically.
Learning from Demonstration Using MDP Induced Metrics
"... In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernelbased approach. Our proposed approach rests on the insight t ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernelbased approach. Our proposed approach rests on the insight that the MDP structure can be encapsulated into an adequate statespace metric. In particular we show that, using MDP metrics, we are able to cast the problem of learning from demonstration as a classi cation problem and attain similar generalization performance as methods based on inverse reinforcement learning at a much lower online computational cost. Our method is also able to attain superior generalization than other supervised learning methods that fail to consider the MDP structure.
Learning from Demonstrations: Is It Worth Estimating a Reward Function?
"... Abstract. This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations o ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
Abstract. This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations of an expert. In the AL framework, the agent tries to learn the expert policy whereas in the IRL framework, the agent tries to learn a reward which can explain the behavior of the expert. This reward is then optimized to imitate the expert. One can wonder if it is worth estimating such a reward, or if estimating a policy is sufficient. This quite natural question has not really been addressed in the literature right now. We provide partial answers, both from a theoretical and empirical point of view. 1