| "Learning from Delayed Rewards", Christopher Watkins, Ph.D. Thesis, King's College, Cambridge. |
....embodied mobile robots, so that we no longer have to do it all by hand. There have been no reports to date of programs evolved for embodied robots. There has been work on learning new behaviors using reinforcement learning, e.g. Kaelbling 90] and [Mahadevan and Connell 90] used Q 1earning ( Watkins 89] The major drawback is the large number of runtime trials, many more than needed by real animals, and the need for carefully shaping the learning by splitting up the tasks into little pieces that the robot learns sequentially. It seems that real animals have innate built in structures that ....
"Learning from Delayed Rewards", Christopher Watkins, Ph.D. Thesis, King's College, Cambridge.
....[Atkeson 89] and [Viola 90] but in the main they are not based on back propagation. The most successful recent learning techniques for situated, embodied, mobile robots, have not been based on parallel algorithms at all rather they use a reinforcement learning algorithm such as Q learning ( Watkins 89] as for example, Kaelbling 90] and [Mahadevan and Connell 90] One problem for neural networks becoming situated or embodied is that they do not have a simple translation into time varying perception or action pattern systems. They need extensive front and back ends to equip them to interact ....
"Learning from Delayed Rewards", Christopher Watkins, Ph.D. Thesis, King's College, Cambridge, 1989.
....a single robot. 2.2 ROBOTS We have built over ten robots that are programmed with the subsumption architecture. See [Brooks 90b] for a recent overview. Some of these robots have had learning capabilities. But they have not followed the reinforcement learning techniques like Q learning of [Watkins 89] as has the work, with physical robots, of [Kaelbling 90] and [Mahadevan and Connell 91] 2.3 Sensors and Action Real sensors are very noisy. They do not give the same sort of simple mapping from actual world state to a clean input vector that we are used to expecting after using simulated ....
"Learning from Delayed Rewards", Christopher Watkins, Ph.D. Thesis, King's College, Cambridge, 1989.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC