HQ-learning is a hierarchical extension of Q()-learning designed to solve certain types of partially observable Markov decision problems (POMDPs). HQ automatically decomposes POMDPs into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive subagents. HQ can solve partially observable mazes with more states than those used in most previous POMDP work. Keywords: reinforcement learning, hierarchical Q-learning, POMDPs, non-Markov, subgoal learning.
|
938
|
Learning from Delayed Rewards
– Watkins
- 1989
|
|
885
|
Learning to Predict by the Methods of Temporal Differences
– Sutton
- 1988
|
|
408
|
Planning and acting in partially observable stochastic domains
– Kaelbling, Littman, et al.
- 1998
|
|
239
|
Prioritized sweeping: Reinforcement learning with less data and less real time
– Moore, Atkeson
- 1993
|
|
221
|
The optimal control of Partially Observable Markov Processe
– Sondik
- 1971
|
|
181
|
Theory of Optimal Experiments
– Fedorov
- 1972
|
|
175
|
Motivated reinforcement learning
– Dayan
- 2001
|
|
175
|
Learning Policies for Partially Observable Environments: Scaling Up
– Littman, Cassandra, et al.
- 1995
|
|
174
|
Reinforcement learning for robots using neural networks
– Lin
- 1993
|
|
171
|
Classifier fitness based on accuracy
– Wilson
- 1995
|
|
155
|
Reinforcement learning with perceptual aliasing: The Perceptual Distinctions Approach
– Chrisman
- 1992
|
|
131
|
Algorithms for Sequential Decision Making
– Littman
- 1996
|
|
127
|
Long short-term memory
– Hochreiter, Schmidhuber
- 1997
|
|
122
|
I.: Reinforcement Learning Algorithm for Partially Observable
– Jaakkola, Singh, et al.
- 1994
|
|
118
|
Approximating optimal policies for partially observable stochastic domains. (unpublished manuscript
– Parr, Russell
- 1995
|
|
99
|
Neural network exploration using optimal experiment design
– Cohn
- 1994
|
|
94
|
Overcoming incomplete perception with utile distinction memory
– McCallum
|
|
90
|
Efficient exploration in reinforcement learning
– Thrun
- 1992
|
|
87
|
The truck backer-upper: An example of selflearning in neural networks
– Nguyen, Widrow
- 1990
|
|
87
|
The evolution of mental models
– Teller
- 1994
|
|
83
|
ZCS: A zeroth level classifier system
– Wilson
- 1994
|
|
78
|
Computing optimal policies for partially observable decision processes using compact representations
– Boutilier, Poole
- 1996
|
|
78
|
Memoryless policies: theoretical limitations and practical results
– Littman
- 1994
|
|
74
|
Incremental multi-step Q-learning
– Peng, Williams
- 1996
|
|
70
|
Action Selection methods using Reinforcement Learning
– Humphrys
- 1997
|
|
61
|
Universal sequential search problems
– Levin
- 1973
|
|
58
|
Learning to use selective attention and short-term memory in sequential tasks
– McCallum
- 1996
|
|
56
|
Continual Learning in Reinforcement Environments
– Ring
- 1994
|
|
55
|
Learning complex, extended sequences using the principle of history compression
– Schmidhuber
- 1992
|
|
55
|
Reinforcement Learning for the Adaptive Control of Perception and Action
– Whitehead
- 1992
|
|
52
|
TD models: Modeling the world at a mixture of time scales
– Sutton
- 1995
|
|
49
|
Reinforcement learning in Markovian and nonMarkovian environments
– Schmidhuber
- 1991
|
|
48
|
Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental selfimprovement
– Schmidhuber, Zhao, et al.
- 1997
|
|
47
|
Probabilistic incremental program evolution
– Salustowicz, Schmidhuber
- 1997
|
|
42
|
Emergent hierarchical control structures: Learning reactive/hierarchical relationships in reinforcement environments
– Digney
- 1996
|
|
37
|
Learning Probabilistic Automata with Variable Memory Length
– Ron, Singer, et al.
- 1994
|
|
32
|
Curious model-building control systems
– Schmidhuber
- 1991
|
|
28
|
Reinforcement learning with selfmodifying policies
– Schmidhuber, Zhao, et al.
- 1997
|
|
25
|
Reinforcement learning of multiple tasks using a hierarchical CMAC architecture
– Tham
- 1995
|
|
24
|
The e cient learning of multiple task sequences
– Singh
- 1992
|
|
23
|
Supervised learning with a distal teacher
– Jordan, Rumelhart
- 1992
|
|
21
|
Solving POMDPs with Levin search and EIRA
– Wiering, Schmidhuber
- 1996
|
|
20
|
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
– Hihi, Bengio
- 1996
|
|
17
|
Planning in stochastic domains: Problem characteristics and approximation
– Zhang, Liu
- 1996
|
|
16
|
Adding temporary memory to ZCS. Adaptive Behavior
– Cliff, Ross
- 1994
|
|
15
|
Reinforcement driven information acquisition in nondeterministic environments
– Schmidhuber, Storck
- 1993
|
|
15
|
Explore/Exploit strategies in autonomy
– Wilson
- 1996
|
|
13
|
Learning to generate sub-goals for action sequences
– Schmidhuber
- 1991
|
|
12
|
What's interesting
– Schmidhuber
- 1997
|
|
8
|
Training Q-agents
– Caironi, Dorigo
- 1994
|