Results 1  10
of
147
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract

Cited by 229 (24 self)
 Add to MetaCart
(Show Context)
A preliminary unedited version of this paper was incorrectly published as part of Volume
Basis function adaptation in temporal difference reinforcement learning
 Annals of Operations Research
, 2005
"... Reinforcement Learning (RL) is an approach for solving complex multistage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact ..."
Abstract

Cited by 73 (4 self)
 Add to MetaCart
(Show Context)
Reinforcement Learning (RL) is an approach for solving complex multistage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact representation and enables generalization. Linear approximation architectures (where the adjustable parameters are the weights of prefixed basis functions) have recently gained prominence due to efficient algorithms and convergence guarantees. Nonetheless, an appropriate choice of basis function is important for the success of the algorithm. In the present paper we examine methods for adapting the basis function during the learning process in the context of evaluating the value function under a fixed control policy. Using the Bellman approximation error as an optimization criterion, we optimize the weights of the basis function while simultaneously adapting the (nonlinear) basis function parameters. We present two algorithms for this problem. The first uses a gradientbased approach and the second applies the Cross Entropy method. The performance of the proposed algorithms is evaluated and compared in simulations.
Learning from Observation Using Primitives
, 2004
"... This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collect ..."
Abstract

Cited by 70 (5 self)
 Add to MetaCart
(Show Context)
This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collected while a human performs a task is parsed into small parts of the task called primitives. Modules are created for each primitive that encode the movements required during the performance of the primitive, and when and where the primitives are performed. The feasibility of this method is currently being tested with agents that learn to play a virtual and an actual air hockey game. 1
Identifying useful subgoals in reinforcement learning by local graph partitioning
 In Proceedings of the TwentySecond International Conference on Machine Learning
, 2005
"... We present a new subgoalbased method for automatically creating useful skills in reinforcement learning. Our method identifies subgoals by partitioning local state transition graphs—those that are constructed using only the most recent experiences of the agent. The local scope of our subgoal discov ..."
Abstract

Cited by 69 (10 self)
 Add to MetaCart
(Show Context)
We present a new subgoalbased method for automatically creating useful skills in reinforcement learning. Our method identifies subgoals by partitioning local state transition graphs—those that are constructed using only the most recent experiences of the agent. The local scope of our subgoal discovery method allows it to successfully identify the type of subgoals we seek—states that lie between two denselyconnected regions of the state space—while producing an algorithm with low computational cost.
Building portable options: Skill transfer in reinforcement learning
 Proceedings of the 20th International Joint Conference on Artificial Intelligence
, 2007
"... The options framework provides methods for reinforcement learning agents to build new highlevel skills. However, since options are usually learned in the same state space as the problem the agent is solving, they cannot be used in other tasks that are similar but have different state spaces. We int ..."
Abstract

Cited by 57 (12 self)
 Add to MetaCart
(Show Context)
The options framework provides methods for reinforcement learning agents to build new highlevel skills. However, since options are usually learned in the same state space as the problem the agent is solving, they cannot be used in other tasks that are similar but have different state spaces. We introduce the notion of learning options in agentspace, the space generated by a feature set that is present and retains the same semantics across successive problem instances, rather than in problemspace. Agentspace options can be reused in later tasks that share the same agentspace but have different problemspaces. We present experimental results demonstrating the use of agentspace options in building transferrable skills, and show that they perform best when used in conjunction with problemspace options. 1
QCut  Dynamic Discovery of Subgoals in Reinforcement Learning
 Machine Learning: ECML 2002, 13th European Conference on Machine Learning, volume 2430 of LectureNotes in Computer Science
, 2002
"... We present the QCut algorithm, a graph theoretic approach for automatic detection of subgoals in a dynamic environment, which is used for acceleration of the QLearning algorithm. The learning agent creates an online map of the process history, and uses an efficient MaxFlow/MinCut algorithm for ..."
Abstract

Cited by 50 (5 self)
 Add to MetaCart
(Show Context)
We present the QCut algorithm, a graph theoretic approach for automatic detection of subgoals in a dynamic environment, which is used for acceleration of the QLearning algorithm. The learning agent creates an online map of the process history, and uses an efficient MaxFlow/MinCut algorithm for identifing bottlenecks. The policies for reaching bottlenecks are separately learned and added to the model in a f8: of options (macroactions). We then extend the basic QCut algorithm to the Segmented QCut algorithm, hich uses previously identified bottlenecksf or state space partitioning, necessary for finding additional bottlenecks in complex environments. Experiments sho significant performance improvements, particulary in the initial learning phase.
Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization
 Proceedings of the 8th Conference on Intelligent Autonomous Systems, IAS8
, 2004
"... We introduce a new method for hierarchical reinforcement learning. Highlevel policies automatically discover subgoals; lowlevel policies learn to specialize on different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. Highlevel value functions c ..."
Abstract

Cited by 41 (4 self)
 Add to MetaCart
(Show Context)
We introduce a new method for hierarchical reinforcement learning. Highlevel policies automatically discover subgoals; lowlevel policies learn to specialize on different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. Highlevel value functions cover the state space at a coarse level; lowlevel value functions cover only parts of the state space at a finegrained level. Experiments show that this method outperforms several flat reinforcement learning methods in a deterministic task and in a stochastic task.
Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
"... We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an endoftask reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous ..."
Abstract

Cited by 39 (8 self)
 Add to MetaCart
(Show Context)
We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an endoftask reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous domain and that doing so results in performance gains. 1