Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to be combined. On the one hand, the environment must be sufficiently explored in order to identify a (sub-) optimal controller. For instance, a robot facing an unknown environment has to spend time moving around
|
5012
|
Genetic Algorithms
– Goldberg
- 1989
|
|
2138
|
Learning Internal Representations by Error Propagation
– Rumelhart, Hinton, et al.
- 1986
|
|
1127
|
Finding structure in time
– Elman
- 1990
|
|
1055
|
Neural networks and physical systems with emergent collective computational abilities
– Hopfield
- 1982
|
|
972
|
Learning from delayed rewards
– Watkins
- 1989
|
|
908
|
Learning to Predict by the Methods of Temporal Differences
– Sutton
- 1988
|
|
402
|
Fast learning in networks of locally-tuned processing units
– Moody, Darken
- 1989
|
|
383
|
Integrated Architectures for Learning, Planning, and Reacting based on Approximating Dynamic Programming
– Sutton
- 1990
|
|
367
|
Dynamic Programming and Markov Processes
– Howard
- 1960
|
|
350
|
A robot exploration and mapping strategy based on a semantic hierachy of spatial representations
– KUIPERS, BYUN
- 1991
|
|
334
|
D.: A learning algorithm for continually running fully recurrent neural networks
– Williams, Zipser
- 1989
|
|
316
|
Automatic programming of behavior-based robots using reinforcement learning
– Mahadevan, Connell
- 1992
|
|
256
|
Learning in embedded systems
– Kaelbling
- 1993
|
|
251
|
Self-Improving reactive agents based on reinforcement learning, planning and teaching
– Lin
- 1992
|
|
212
|
Temporal Credit Assignment in Reinforcement Learning
– Sutton
- 1984
|
|
192
|
Learning and sequential decision making
– Barto, Sutton, et al.
- 1989
|
|
161
|
Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
– Chrisman
- 1992
|
|
151
|
Serial order: A parallel distributed processing approach (Tech
– Jordan
- 1986
|
|
147
|
Transfer of learning by composing solutions of elemental sequential tasks
– Singh
- 1992
|
|
147
|
Learning to predict by the methods of temporal di erences
– Sutton
- 1988
|
|
111
|
Real-time learning and control using asynchronous dynamic pro-gramming
– Barto, Bradtke, et al.
- 1991
|
|
94
|
Efficient Memory-Based Learning for Robot Control
– Moore
- 1990
|
|
93
|
Efficient exploration in reinforcement learning
– Thrun
- 1992
|
|
91
|
The Truck Backer-Upper: An Example of Self-Learning in Neural Networks
– Nguyen, Widrow
- 1991
|
|
80
|
Neural networks for control
– Miller, Sutton, et al.
- 1990
|
|
77
|
Becoming Increasingly Reactive
– Mitchell
- 1990
|
|
69
|
Learning to perceive and act by trial and error
– Whitehead, Ballard
- 1991
|
|
64
|
Learning to control an unstable system with forward modeling
– Jordan, Jacobs
- 1990
|
|
63
|
Programming robots using reinforcement learning and teaching
– Lin
- 1991
|
|
59
|
Diversity-based inference of finite automata
– Rivest, Schapire
- 1994
|
|
58
|
Memory Approaches to Reinforcement Learning in Non-Markovian Domains
– Lin, Mitchell
- 1992
|
|
50
|
Reinforcement learning in Markovian and non-Markovian environments
– Schmidhuber
- 1991
|
|
50
|
Active exploration in dynamic environments
– Thrun, Moller
- 1992
|
|
49
|
Connectionist learning for control : An overview
– Barto
- 1990
|
|
48
|
Learning and Problem Solving with Multilayer Connectionist Systems
– Anderson
- 1986
|
|
44
|
Scaling reinforcement learning to robotics by exploiting the subsumption architecture
– Mahadevan, Connell
- 1991
|
|
44
|
CMAC: An Associative Neural Network Alternative to Backpropagation
– Miller, Glanz, et al.
- 1990
|
|
40
|
Neural networks and physical systems with emergent collective computational capabilities
– eld
- 1982
|
|
36
|
Discovering the structure of a reactive environment by exploration
– Mozer, Bachrach
- 1989
|
|
34
|
Using locally weighted regression for robot learning
– Atkeson
- 1991
|
|
32
|
A dual back-propagation scheme for scalar reward learning
– Munro
- 1987
|
|
27
|
Real-time heuristic search: New results
– Korf
- 1988
|
|
26
|
Complexity and cooperation in Q-learning
– Whitehead
- 1991
|
|
25
|
On the computational economics of reinforcement learning
– Barto, Singh
- 1991
|
|
25
|
Integrated modeling and control based on reinforcement learning and dynamic programming
– Sutton
- 1991
|
|
24
|
A study of cooperative mechanisms for faster reinforcement learning
– Whitehead
- 1991
|
|
22
|
Real-time obstacle avoidance for robot manipulator and mobile robots
– Khatib
- 1986
|
|
19
|
Trajectory formation of arm movements by cascade neural network models based on minimum torque change criterion. Biol Cybem
– Kawato, Maeda, et al.
- 1990
|
|
19
|
Optimal probabilistic and decision-theoretic planning using markovian decision theory
– Koenig
- 1992
|
|
18
|
Learning a cost-sensitive internal representation for reinforcement learning
– Tan
- 1991
|