Results 1  10
of
41
Discovering hierarchy in reinforcement learning with hexq
 In Nineteenth International Conference on Machine Learning
, 2002
"... An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a modelfree factored MDP hierarchically is described. By searching for aliased Markov subspace regions based on the state variables the algorithm ..."
Abstract

Cited by 95 (5 self)
 Add to MetaCart
An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a modelfree factored MDP hierarchically is described. By searching for aliased Markov subspace regions based on the state variables the algorithm uses temporal and state abstraction to construct a hierarchy of interlinked smaller MDPs. 1.
An Integrated Approach to Hierarchy and Abstraction for POMDPs
, 2002
"... This paper presents an algorithm for planning in structured partially observable Markov Decision Processes (POMDPs). The new algorithm, named PolCA (for PolicyContingent Abstraction) uses an actionbased decomposition to partition complex POMDP problems into a hierarchy of smaller subproblems. Low ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
This paper presents an algorithm for planning in structured partially observable Markov Decision Processes (POMDPs). The new algorithm, named PolCA (for PolicyContingent Abstraction) uses an actionbased decomposition to partition complex POMDP problems into a hierarchy of smaller subproblems. Lowlevel subtasks are solved rst, and their partial policies are used to model abstract actions in the context of higherlevel subtasks. At all levels of the hierarchy, subtasks need only consider a reduced action, state and observation space. The reduced action set is provided by a designer, whereas the reduced state and observations sets are discovered automatically on a subtaskpersubtask basis. This typically results in lowerlevel subtasks having few, but highresolution, state/observations features, whereas highlevel subtasks tend to have many, but lowresolution, state/observation features. This paper presents a detailed overview of PolCA in the context of a POMDP hierarchical planning and execution algorithm. It also includes theoretical results demonstrating that in the special case of fully observable MDPs, the algorithm converges to a recursively optimal solution. Experimental results included in the paper demonstrate the usefulness of the approach on a range of problems, and show favorable performance compared to competing functionapproximation POMDP algorithms. Finally, the paper presents a realworld implementation and deployment of a robotic system which uses PolCA in the context of a highlevel robot behavior control task.
The Constructivist Learning Architecture: A Model of Cognitive Development for Robust Autonomous Robots
, 2004
"... Copyright by ..."
Using abstract models of behaviours to automatically generate reinforcement learning hierarchies
 In Proceedings of The 19th International Conference on Machine Learning
, 2002
"... In this paper we present a hybrid system combining techniques from symbolic planning and reinforcement learning. Planning is used to automatically construct task hierarchies for hierarchical models of the behaviours ’ purpose, and to perform intelligent termination improvement when an executing beha ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
In this paper we present a hybrid system combining techniques from symbolic planning and reinforcement learning. Planning is used to automatically construct task hierarchies for hierarchical models of the behaviours ’ purpose, and to perform intelligent termination improvement when an executing behaviour is no longer appropriate. Reinforcement learning is used to produce concrete implementations of abstractly defined behaviours and to learn the best possible choice of behaviour when plans are ambiguous. Two new hierarchical reinforcement learning algorithms are presented: Planned Hierarchical SemiMarkov QLearning (PHSMQ), a variant of the HSMQ algorithm (Dietterich, 2000b) which uses planbuilt task hierarchies, and TeleoReactive QLearning (TRQ) a more complex algorithm which implements hierarchical reinforcement learning with teleoreactive execution semantics (Nilsson, 1994). Each algorithm is demonstrated in a simple gridworld domain. 1.
Toward a topological theory of relational reinforcement learning for navigation tasks
 In Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS2005
, 2005
"... We examine application of relational learning methods to reinforcement learning in spatial navigation tasks. Specifically, we consider a goalseeking agent with noisy control actions embedded in an environment with strong topological structure. While formally a Markov decision process (MDP), this ta ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
We examine application of relational learning methods to reinforcement learning in spatial navigation tasks. Specifically, we consider a goalseeking agent with noisy control actions embedded in an environment with strong topological structure. While formally a Markov decision process (MDP), this task possesses special structure derived from the underlying topology that can be exploited to speed learning. We describe relational policies for such environments that are relocatable by virtue of being parameterized solely in terms of the relations (distance and direction) between the agent’s current state and the goal state. We demonstrate that this formulation yields significant learning improvements in completely homogeneous environments for which exact policy relocation is possible. We also examine the effects of nonhomogeneities such as walls or obstacles and show that their effects can be neglected if they fall outside of a closedform envelope surrounding the optimal path between the agent and the goal. To our knowledge, this is the first closedform result for the structure of an envelope in an MDP. We demonstrate that relational reinforcement learning in an environment that obeys the envelope constraints also yields substantial learning performance improvements.
A hierarchical approach to efficient reinforcement learning in deterministic domains
 Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems
, 2006
"... Factored representations, modelbased learning, and hierarchies are wellstudied techniques for improving the learning efficiency of reinforcementlearning algorithms in largescale state spaces. We bring these three ideas together in a new algorithm. Our algorithm tackles two open problems from the ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Factored representations, modelbased learning, and hierarchies are wellstudied techniques for improving the learning efficiency of reinforcementlearning algorithms in largescale state spaces. We bring these three ideas together in a new algorithm. Our algorithm tackles two open problems from the reinforcementlearning literature, and provides a solution to those problems in deterministic domains. First, it shows how models can improve learning speed in the hierarchybased MaxQ framework without disrupting opportunities for state abstraction. Second, we show how hierarchies can augment existing factored exploration algorithms to achieve not only low sample complexity for learning, but provably efficient planning as well. We illustrate the resulting performance gains in example domains. We prove polynomial bounds on the computational effort needed to attain near optimal performance within the hierarchy.
State space reduction for hierarchical reinforcement learning
 In Proc. FLAIRS Conf
, 2004
"... This paper provides new techniques for abstracting the state space of a Markov Decision Process (MDP). These techniques extend one of the recent minimization models, known as ɛreduction, to construct a partition space that has a smaller number of states than the original MDP. As a result, learning ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
This paper provides new techniques for abstracting the state space of a Markov Decision Process (MDP). These techniques extend one of the recent minimization models, known as ɛreduction, to construct a partition space that has a smaller number of states than the original MDP. As a result, learning policies on the partition space should be faster than on the original state space. The technique presented here extends reduction to SMDPs by executing a policy instead of a single action, and grouping all states which have a small difference in transition probabilities and reward function under a given policy. When the reward structure is not known, a twophase method for state aggregation is introduced and a theorem in this paper shows the solvability of tasks using the twophase method partitions. These partitions can be further refined when the complete structure of reward is available. Simulations of different state spaces show that the policies in both MDP and this representation achieve similar results and the total learning time in partition space in presented approach is much smaller than the total amount of time spent on learning on the original state space.
Hierarchical dialogue optimization using semimarkov decision processes
 In Proceedings of the European Conference on Speech Communication and Technologies (Interspeech’07), Anvers
, 2007
"... This paper addresses the problem of dialogue optimization on large search spaces. For such a purpose, in this paper we propose to learn dialogue strategies using multiple SemiMarkov Decision Processes and hierarchical reinforcement learning. This approach factorizes state variables and actions in o ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of dialogue optimization on large search spaces. For such a purpose, in this paper we propose to learn dialogue strategies using multiple SemiMarkov Decision Processes and hierarchical reinforcement learning. This approach factorizes state variables and actions in order to learn a hierarchy of policies. Our experiments are based on a simulated flight booking dialogue system and compare flat versus hierarchical reinforcement learning. Experimental results show that the proposed approach produced a dramatic search space reduction (99.36%), and converged four orders of magnitude faster than flat reinforcement learning with a very small loss in optimality (on average 0.3 system turns). Results also report that the learnt policies outperformed a handcrafted one under three different conditions of ASR confidence levels. This approach is appealing to dialogue optimization due to faster learning, reusable subsolutions, and scalability to larger problems. Index Terms: Spoken dialogue systems, semiMarkov decision processes, hierarchical reinforcement learning.
An IntrinsicallyMotivated Schema Mechanism to Model and Simulate Emergent Cognition
, 2011
"... We introduce an approach to model and simulate the early mechanisms of emergent cognition based on theories of enactive cognition and on constructivist epistemology. The agent has intrinsic motivations implemented as inborn proclivities that drive the agent in a proactive way. Following these drives ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We introduce an approach to model and simulate the early mechanisms of emergent cognition based on theories of enactive cognition and on constructivist epistemology. The agent has intrinsic motivations implemented as inborn proclivities that drive the agent in a proactive way. Following these drives, the agent autonomously learns regularities afforded by the environment, and hierarchical sequences of behaviors adapted to these regularities. The agent represents its current situation in terms of perceived affordances that develop through the agent’s experience. This situational representation works as an emerging situation awareness that is grounded in the agent’s interaction with its environment and that in turn generates expectations and activates adapted behaviors. Through its activity and these aspects of behavior (behavioral proclivity, situation awareness, and hierarchical sequential learning), the agent starts to exhibit emergent sensibility, intrinsic motivation, and autonomous learning. Following theories of cognitive development, we argue that this initial autonomous mechanism provides a basis for implementing autonomously developing cognitive systems.