2 citations found. Retrieving documents...
R. S. Sutton, "Gain adaptation beats least squares?", in Proc. 7th Yale Workshop on Adaptive and Learning Systems, 1992, pp. 161--166, ftp:// ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92b.ps.gz.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Online Independent Component Analysis With Local.. - Schraudolph.. (1999)   (Correct)

....that the method of Almeida et al. 15] can indeed be improved by this approach [3] While such averaging serves to reduce the stochasticity of the product ffi t Delta ffi t Gamma1 implied by (3) and (4) the average remains one of immediate, single step effects. By contrast, Sutton [17, 18] models the long term effect of p on future weight updates in a linear system by carrying the relevant partials forward through time, as is done in real time recurrent learning [19] This results in an iterative update rule for v, which we have extended to nonlinear systems [3, 4] We define v ....

R. S. Sutton, "Gain adaptation beats least squares?", in Proc. 7th Yale Workshop on Adaptive and Learning Systems, 1992, pp. 161--166, ftp:// ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92b.ps.gz.


Local Gain Adaptation in Stochastic Gradient Descent - Schraudolph (1999)   (5 citations)  (Correct)

....the optimal gain under simplifying assumptions which may or may not model a given situation well. Even such optimal algorithms as Kalman filtering can thus be outperformed by adaptation methods which measure the effects of finite step sizes in order to incrementally adapt the gain [4]. Unfortunately most of the existing gain adaptation algorithms for neural networks adapt only a single, global learning rate [5, 6] can be used only in batch training [7, 8, 9, 10, 11, 12] or both [13, 14, 15, 16] Given the well known advantages of stochastic gradient descent, it would be ....

....of stochastic gradient descent, it would be desirable to have online methods for local gain adaptation in nonlinear neural networks. The first such algorithms have recently been proposed [17, 18] here we develop a more sophisticated alternative by extending Sutton s work on linear systems [4, 19] to the general, nonlinear case. The resulting stochastic meta descent (SMD) algorithms support online learning with unprecedented speed and robustness of convergence. 2 The SMD Algorithm Given a sequence x 0 ; x 1 ; of data points, we minimize the expected value of a ....

[Article contains additional citation context not shown here]

R. S. Sutton, "Gain adaptation beats least squares?", in Proc. 7th Yale Workshop on Adaptive and Learning Systems, 1992, pp. 161--166, ftp://ftp.cs.umass.edu/pub/ anw/pub/sutton/sutton-92b.ps.gz.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC