Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to be combined. On the one hand, the environment must be sufficiently explored in order to identify a (sub-) optimal controller. For instance, a robot facing an unknown environment has to spend time moving around
|
4828
|
Genetic Algorithms
– Goldberg
- 1989
|
|
2141
|
Learning Internal Representations by Error Propagation
– Rumelhart, Hinton, et al.
- 1986
|
|
1092
|
Finding Structure in Time
– Elman
- 1990
|
|
1007
|
Neural networks and physical systems with emergent collective computational abilities
– Hopfield
- 1982
|
|
939
|
Learning from Delayed Rewards
– Watkins
- 1989
|
|
885
|
Learning to predict by the methods of temporal differences
– Sutton
- 1988
|
|
396
|
Fast learning in networks of locally-tuned processing units
– Darken
- 1989
|
|
375
|
Integrated Architectures for Learning, Planning and Reacting based on Approximate Dynamic Programming. Appeared
– Sutton
- 1990
|
|
353
|
Dynamic Programming and Markov Processes
– Howard
- 1960
|
|
343
|
A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations
– Kuipers, Byun
- 1991
|
|
322
|
A learning algorithm for continually running fully recurrent neural networks
– Williams, Zipser
- 1990
|
|
306
|
Automatic programming of behavior-based robots using reinforcement learning
– Mahadevan, Connell
- 1991
|
|
252
|
Learning in Embedded Systems
– Kaelbling
- 1993
|
|
242
|
Self-improving reactive agents based on reinforcement learning, planning and teaching
– Lin
- 1992
|
|
212
|
Temporal Credit Assignment in Reinforcement Learning
– Sutton
- 1984
|
|
189
|
Learning and sequential decision making
– Barto, Sutton, et al.
- 1990
|
|
155
|
Reinforcement learning with perceptual aliasing: The Perceptual Distinctions Approach
– Chrisman
- 1992
|
|
147
|
Learning to predict by the methods of temporal di erences
– Sutton
- 1988
|
|
145
|
Serial order; a parallel distributed processing approach
– Jordan
- 1989
|
|
144
|
Transfer of learning by composing solutions of elemental sequential tasks
– Singh
- 1992
|
|
109
|
Real-time learning and control using asynchronous dynamic programming
– Barto, Bradtke, et al.
- 1991
|
|
92
|
Efficient Memory-Based Learning for Robot Control
– Moore
- 1990
|
|
90
|
Efficient exploration in reinforcement learning
– Thrun
- 1992
|
|
87
|
The truck backer-upper: An example of selflearning in neural networks
– Nguyen, Widrow
- 1990
|
|
75
|
Neural Networks for Control
– Miller, Sutton, et al.
|
|
73
|
Becoming increasingly reactive
– Mitchell
- 1990
|
|
69
|
Learning to perceive and act by trial and error
– Whitehead, Ballard
- 1991
|
|
64
|
Learning to control an unstable system with forward modeling
– Jordan, Jacobs
- 1990
|
|
61
|
Programming Robots Using Reinforcement Learning and Teaching
– Lin
- 1991
|
|
58
|
Memory Approaches to Reinforcement Learning in Non-Markovian Domains
– Lin, Mitchell
- 1992
|
|
58
|
Diversity-based inference of finite automata
– Rivest, Schapire
- 1987
|
|
50
|
Active exploration in dynamic environments
– Thrun, Moller
- 1992
|
|
49
|
Reinforcement learning in Markovian and nonMarkovian environments
– Schmidhuber
- 1991
|
|
47
|
Learning and Problem Solving with Multilayer Connectionist Systems
– Anderson
- 1986
|
|
47
|
Connectionist learning for control: An overview
– Barto
- 1990
|
|
43
|
Scaling reinforcement learning to robotics by exploiting the subsumption architecture
– Mahadevan, Connell
- 1991
|
|
43
|
CMAC: An associative neural network alternative to backpropagation
– Miller, Glanz, et al.
- 1990
|
|
37
|
Neural networks and physical systems with emergent collective computational abilities
– eld
- 1982
|
|
36
|
Discovering the structure of a reactive environment by exploration
– Mozer, Bachrach
- 1989
|
|
33
|
Using locally weighted regression for robot learning
– Atkeson
- 1991
|
|
32
|
A dual back-propagation scheme for scalar reward learning
– Munro
- 1987
|
|
27
|
Real-time heuristic search: New results
– Korf
- 1988
|
|
25
|
On the computational economics of reinforcement learning
– Barto, Singh
- 1991
|
|
25
|
Integrated modeling and control based on reinforcement learning and dynamic programming
– Sutton
- 1991
|
|
25
|
Complexity and cooperation in Q-learning
– Whitehead
- 1991
|
|
24
|
A study of cooperative mechanisms for faster reinforcement learning
– Whitehead
- 1991
|
|
22
|
Real-time obstacle avoidance for robot manipulator and mobile robots
– Khatib
- 1986
|
|
19
|
Optimal probabilistic and decision-theoretic planning using markovian decision theory
– Koenig
- 1992
|
|
18
|
Trajectory formation of arm movement by cascade neural network model based on minimum torque-change criterion. Biol Cybern 62
– Kawato, Maeda, et al.
- 1990
|
|
18
|
Learning a cost-sensitive internal representation for reinforcement learning
– Tan
- 1991
|