• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 3,599,024
Next 10 →

Basis Expansion in Natural Actor Critic Methods

by Sertan Girgin
"... Abstract. In reinforcement learning, the aim of the agent is to find a policy that maximizes its expected return. Policy gradient methods try to accomplish this goal by directly approximating the policy using a parametric function approximator; the expected return of the current policy is estimated ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
constructing a set of basis functions within the context of Natural Actor-Critic (NAC) algorithms. Such basis functions allow more complex policies be represented, and consequently improve the performance of the resulting policies. We also present the effectiveness of the method empirically. 1

Incremental Natural Actor-Critic Algorithms

by Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee
"... We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal ..."
Abstract - Cited by 71 (8 self) - Add to MetaCart
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal

Incremental Natural Actor-Critic Algorithms

by unknown authors
"... We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal ..."
Abstract - Add to MetaCart
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal

Actor-Critic Algorithms

by Vijay R. Konda, John N. Tsitsiklis - SIAM JOURNAL ON CONTROL AND OPTIMIZATION , 2001
"... In this paper, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference (TD) learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction based on in ..."
Abstract - Cited by 242 (1 self) - Add to MetaCart
In this paper, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference (TD) learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction based

Natural Actor-Critic

by Jan Peters , Stefan Schaal , 2007
"... In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients employing Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a valu ..."
Abstract - Cited by 90 (10 self) - Add to MetaCart
makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke’s Linear Quadratic Q-Learning are in fact Natural Actor-Critic

Supervised Actor-Critic

by Reinforcement Learning Michael, Michael T. Rosenstein, Andrew G. Barto
"... INTRODUCTION Reinforcement learning (RL) and supervised learning are usually portrayed as distinct methods of learning from experience. RL methods are often applied to problems involving sequential dynamics and optimization of a scalar performance objective, with online exploration of the effects o ..."
Abstract - Add to MetaCart
INTRODUCTION Reinforcement learning (RL) and supervised learning are usually portrayed as distinct methods of learning from experience. RL methods are often applied to problems involving sequential dynamics and optimization of a scalar performance objective, with online exploration of the effects

Incremental Natural Actor-Critic Algorithms

by unknown authors
"... We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal ..."
Abstract - Add to MetaCart
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal

Natural-Gradient Actor-Critic Algorithms

by Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee , 2007
"... We prove the convergence of four new reinforcement learning algorithms based on the actorcritic architecture, on function approximation, and on natural gradients. Reinforcement learning is a class of methods for solving Markov decision processes from sample trajectories under lack of model informati ..."
Abstract - Add to MetaCart
results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural-gradient actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence

Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

by Paul Wagner
"... Approximate dynamic programming approaches to the reinforcement learning problem are often categorized into greedy value function methods and value-based policy gradient methods. As our first main result, we show that an important subset of the latter methodology is, in fact, a limiting special case ..."
Abstract - Add to MetaCart
case of a general formula-tion of the former methodology; optimistic policy iteration encompasses not only most of the greedy value function methods but also natural actor-critic methods, and permits one to directly interpolate between them. The resulting continuum ad-justs the strength of the Markov

Linear off-policy actor-critic

by Thomas Degris, Martha White, Richard S. Sutton - In International Conference on Machine Learning , 2012
"... This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and d ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
and does not take advantage of the recent advances in offpolicy gradient temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable a target policy to be learned while following and obtaining data from another (behavior) policy. For many problems, however, actor-critic methods are more
Next 10 →
Results 1 - 10 of 3,599,024
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University