Reinforcement Learning by Construction of Hypothetical Targets
Abstract:
Ageneralapproachtodelayedreinforcementlearningbytheuseofsupervised training algorithms is proposed. The approach is monolithic, direct and involvesminormodificationstoanysupervisedlearningalgorithm.Eachconnectionhas two weight change registers, one for eventual success and one for eventual failure, and the network is trained onself-generatedhypotheticaltarget vectors. Themethod is a close, but more general, relative to selective bootstrapadaption and is tested on an abstract model of the Link Allocation problem in Asynchronous Transfer Mode (ATM) telecommunication networks. 1
Citations
| 2140 | Learning Internal Representations by Error Propagation – Rumelhart, Hinton, et al. - 1986 |
| 885 | Learning to Predict by the Methods of Temporal Differences – Sutton - 1988 |
| 87 | The truck backer-upper: An example of selflearning in neural networks – Nguyen, Widrow - 1990 |
| 52 | Pattern–recognizing stochastic learning automata’, in – Barto, Anandan - 1985 |
| 42 | Integrated Broadband Networks: An introduction to ATM-based Networks – Handel, Huber - 1990 |
| 27 | Punishlreward: Learning with a critic in adaptive threshold systems – Widrow, Gupta, et al. - 1973 |
| 19 | Dynamic reinforcement driven error propagation networks with application to game playing – Robinson, Fallside - 1989 |
| 10 | A dual back-propagation scheme for scalar reinforcement learning – Munro - 1987 |
| 1 | Eds.), Science Survey, part 2 – Michie, ”Trial, et al. - 1961 |

