Results 1 - 10
of
198,754
Solving Sensor Network Coverage Problems by Distributed Asynchronous Actor-Critic Methods ∗
"... Abstract — Multi-robots systems exploiting sensor network capabilities can be successfully employed to cope with several tasks, including coverage, surveillance, target tracking, and foraging, in partially known environments subject to dynamical changes. In this paper we define a reward collection p ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
problem, where both the positions and the values of the rewards change with time. We propose a distributed actor-critic method to solve the problem, establish its convergence, and demonstrate its adaptation capabilities. Our analysis leverages ideas from actor-critic methods and consensus algorithms
Basis Expansion in Natural Actor Critic Methods
"... Abstract. In reinforcement learning, the aim of the agent is to find a policy that maximizes its expected return. Policy gradient methods try to accomplish this goal by directly approximating the policy using a parametric function approximator; the expected return of the current policy is estimated ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
constructing a set of basis functions within the context of Natural Actor-Critic (NAC) algorithms. Such basis functions allow more complex policies be represented, and consequently improve the performance of the resulting policies. We also present the effectiveness of the method empirically. 1
Temporal logic motion control using actor-critic methods
- In Robotics and Automation (ICRA), 2012 IEEE International Conference on
, 2012
"... This paper considers the problem of deploying a robot from a specification given as a tempo-ral logic statement about some properties satisfied by the regions of a large, partitioned environ-ment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of th ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
the necessary optimization problem for the optimal policy, are compu-tationally intensive. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates on sample paths of the robot
Temporal Logic Motion Control using Actor-Critic Methods
"... Abstract — In this paper, we consider the problem of deploy-ing a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through t ..."
Abstract
- Add to MetaCart
as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates
Temporal Logic Motion Control using Actor-Critic Methods ∗
"... Abstract—In this paper, we consider the problem of deploy-ing a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the ..."
Abstract
- Add to MetaCart
as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates
Temporal Logic Motion Control using Actor-Critic Methods ∗
"... Abstract — In this paper, we consider the problem of deploy-ing a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through t ..."
Abstract
- Add to MetaCart
as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates
Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control ∗
"... Abstract — We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic model ..."
Abstract
- Add to MetaCart
the effectiveness of the proposed solution. Index Terms — Markov Decision Processes, dynamic pro-gramming, actor-critic methods, robot motion control, robotics.
Incremental Natural Actor-Critic Algorithms
"... We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal ..."
Abstract
-
Cited by 71 (8 self)
- Add to MetaCart
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal
Actor-Critic Algorithms
- SIAM JOURNAL ON CONTROL AND OPTIMIZATION
, 2001
"... In this paper, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference (TD) learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction based on in ..."
Abstract
-
Cited by 242 (1 self)
- Add to MetaCart
In this paper, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference (TD) learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction based
Linear off-policy actor-critic
- In International Conference on Machine Learning
, 2012
"... This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and d ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
and does not take advantage of the recent advances in offpolicy gradient temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable a target policy to be learned while following and obtaining data from another (behavior) policy. For many problems, however, actor-critic methods are more
Results 1 - 10
of
198,754