MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  HQ-learning (1998) [16 citations — 0 self]

Download:
Download as a PDF | Download as a PS
by Marco Wiering, Jurgen Schmidhuber
Adaptive Behavior
ftp://ftp.idsia.ch/pub/juergen/HQ-LEARNING.ps.gz
Add To MetaCart

Abstract:

HQ-learning is a hierarchical extension of Q()-learning designed to solve certain types of partially observable Markov decision problems (POMDPs). HQ automatically decomposes POMDPs into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive subagents. HQ can solve partially observable mazes with more states than those used in most previous POMDP work. Keywords: reinforcement learning, hierarchical Q-learning, POMDPs, non-Markov, subgoal learning.

Citations

938 Learning from Delayed Rewards – Watkins - 1989
885 Learning to Predict by the Methods of Temporal Differences – Sutton - 1988
408 Planning and acting in partially observable stochastic domains – Kaelbling, Littman, et al. - 1998
239 Prioritized sweeping: Reinforcement learning with less data and less real time – Moore, Atkeson - 1993
221 The optimal control of Partially Observable Markov Processe – Sondik - 1971
181 Theory of Optimal Experiments – Fedorov - 1972
175 Motivated reinforcement learning – Dayan - 2001
175 Learning Policies for Partially Observable Environments: Scaling Up – Littman, Cassandra, et al. - 1995
174 Reinforcement learning for robots using neural networks – Lin - 1993
171 Classifier fitness based on accuracy – Wilson - 1995
155 Reinforcement learning with perceptual aliasing: The Perceptual Distinctions Approach – Chrisman - 1992
131 Algorithms for Sequential Decision Making – Littman - 1996
127 Long short-term memory – Hochreiter, Schmidhuber - 1997
122 I.: Reinforcement Learning Algorithm for Partially Observable – Jaakkola, Singh, et al. - 1994
118 Approximating optimal policies for partially observable stochastic domains. (unpublished manuscript – Parr, Russell - 1995
99 Neural network exploration using optimal experiment design – Cohn - 1994
94 Overcoming incomplete perception with utile distinction memory – McCallum
90 Efficient exploration in reinforcement learning – Thrun - 1992
87 The truck backer-upper: An example of selflearning in neural networks – Nguyen, Widrow - 1990
87 The evolution of mental models – Teller - 1994
83 ZCS: A zeroth level classifier system – Wilson - 1994
78 Computing optimal policies for partially observable decision processes using compact representations – Boutilier, Poole - 1996
78 Memoryless policies: theoretical limitations and practical results – Littman - 1994
74 Incremental multi-step Q-learning – Peng, Williams - 1996
70 Action Selection methods using Reinforcement Learning – Humphrys - 1997
61 Universal sequential search problems – Levin - 1973
58 Learning to use selective attention and short-term memory in sequential tasks – McCallum - 1996
56 Continual Learning in Reinforcement Environments – Ring - 1994
55 Learning complex, extended sequences using the principle of history compression – Schmidhuber - 1992
55 Reinforcement Learning for the Adaptive Control of Perception and Action – Whitehead - 1992
52 TD models: Modeling the world at a mixture of time scales – Sutton - 1995
49 Reinforcement learning in Markovian and nonMarkovian environments – Schmidhuber - 1991
48 Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental selfimprovement – Schmidhuber, Zhao, et al. - 1997
47 Probabilistic incremental program evolution – Salustowicz, Schmidhuber - 1997
42 Emergent hierarchical control structures: Learning reactive/hierarchical relationships in reinforcement environments – Digney - 1996
37 Learning Probabilistic Automata with Variable Memory Length – Ron, Singer, et al. - 1994
32 Curious model-building control systems – Schmidhuber - 1991
28 Reinforcement learning with selfmodifying policies – Schmidhuber, Zhao, et al. - 1997
25 Reinforcement learning of multiple tasks using a hierarchical CMAC architecture – Tham - 1995
24 The e cient learning of multiple task sequences – Singh - 1992
23 Supervised learning with a distal teacher – Jordan, Rumelhart - 1992
21 Solving POMDPs with Levin search and EIRA – Wiering, Schmidhuber - 1996
20 Hierarchical Recurrent Neural Networks for Long-Term Dependencies – Hihi, Bengio - 1996
17 Planning in stochastic domains: Problem characteristics and approximation – Zhang, Liu - 1996
16 Adding temporary memory to ZCS. Adaptive Behavior – Cliff, Ross - 1994
15 Reinforcement driven information acquisition in nondeterministic environments – Schmidhuber, Storck - 1993
15 Explore/Exploit strategies in autonomy – Wilson - 1996
13 Learning to generate sub-goals for action sequences – Schmidhuber - 1991
12 What's interesting – Schmidhuber - 1997
8 Training Q-agents – Caironi, Dorigo - 1994