• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 198,754
Next 10 →

Solving Sensor Network Coverage Problems by Distributed Asynchronous Actor-Critic Methods ∗

by Paris Pennesi, Ioannis Ch. Paschalidis
"... Abstract — Multi-robots systems exploiting sensor network capabilities can be successfully employed to cope with several tasks, including coverage, surveillance, target tracking, and foraging, in partially known environments subject to dynamical changes. In this paper we define a reward collection p ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
problem, where both the positions and the values of the rewards change with time. We propose a distributed actor-critic method to solve the problem, establish its convergence, and demonstrate its adaptation capabilities. Our analysis leverages ideas from actor-critic methods and consensus algorithms

Basis Expansion in Natural Actor Critic Methods

by Sertan Girgin
"... Abstract. In reinforcement learning, the aim of the agent is to find a policy that maximizes its expected return. Policy gradient methods try to accomplish this goal by directly approximating the policy using a parametric function approximator; the expected return of the current policy is estimated ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
constructing a set of basis functions within the context of Natural Actor-Critic (NAC) algorithms. Such basis functions allow more complex policies be represented, and consequently improve the performance of the resulting policies. We also present the effectiveness of the method empirically. 1

Temporal logic motion control using actor-critic methods

by Jing Wang, Xuchu Ding, Morteza Lahijanian, Ioannis Ch. Paschalidis, Calin A. Belta - In Robotics and Automation (ICRA), 2012 IEEE International Conference on , 2012
"... This paper considers the problem of deploying a robot from a specification given as a tempo-ral logic statement about some properties satisfied by the regions of a large, partitioned environ-ment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of th ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
the necessary optimization problem for the optimal policy, are compu-tationally intensive. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates on sample paths of the robot

Temporal Logic Motion Control using Actor-Critic Methods

by unknown authors
"... Abstract — In this paper, we consider the problem of deploy-ing a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through t ..."
Abstract - Add to MetaCart
as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates

Temporal Logic Motion Control using Actor-Critic Methods ∗

by unknown authors
"... Abstract—In this paper, we consider the problem of deploy-ing a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the ..."
Abstract - Add to MetaCart
as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates

Temporal Logic Motion Control using Actor-Critic Methods ∗

by unknown authors
"... Abstract — In this paper, we consider the problem of deploy-ing a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through t ..."
Abstract - Add to MetaCart
as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates

Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control ∗

by unknown authors
"... Abstract — We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic model ..."
Abstract - Add to MetaCart
the effectiveness of the proposed solution. Index Terms — Markov Decision Processes, dynamic pro-gramming, actor-critic methods, robot motion control, robotics.

Incremental Natural Actor-Critic Algorithms

by Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee
"... We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal ..."
Abstract - Cited by 71 (8 self) - Add to MetaCart
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal

Actor-Critic Algorithms

by Vijay R. Konda, John N. Tsitsiklis - SIAM JOURNAL ON CONTROL AND OPTIMIZATION , 2001
"... In this paper, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference (TD) learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction based on in ..."
Abstract - Cited by 242 (1 self) - Add to MetaCart
In this paper, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference (TD) learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction based

Linear off-policy actor-critic

by Thomas Degris, Martha White, Richard S. Sutton - In International Conference on Machine Learning , 2012
"... This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and d ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
and does not take advantage of the recent advances in offpolicy gradient temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable a target policy to be learned while following and obtaining data from another (behavior) policy. For many problems, however, actor-critic methods are more
Next 10 →
Results 1 - 10 of 198,754
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University