| J. Kruschke and J. Movellan, "Benefits of gain: Speeded learning and minimal hidden layers in back propagation networks," IEEE Transactions on Systems, Man and Cybernetics., vol. 21, no. 1, pp. 273--280, 1991. |
....generalisation performance and con vergence time of BP. Current research mostly concentrates on: the optimum setting of learning rates and momentum [5, 9, 18, 40, 45, 50, 51, 52] the optimum setting of the initial weights [6, 24, 53] the enhancement of the contrast in the input patterns [23, 28, 1, 48, 57]; changing he error function [1, 9, 17, 2, 44, 46] finding optimum architectures using pruning echniques [7, 15] In he following we will describe wo speed up methods which are relevan o he work described in he res of he paper: he Momentum and Rprop methods. The Momentum method implements a ....
John K. Kruschke and Javier R. Movellan. Benefits of gain: Speeded learning and minimal hidden layers in back propagation networks. IEEE Transactions on Systems, Man and Cybernetics., 21(1), 1991.
....is y = f(g 3 net) It is easy to see that any change to a neuron gain is equivalent to multiplying its input weights with an appropriate value. Therefore, when an unlimited precision neural network is trained, the gains are usually kept constant although gain adaptation may produce some benefits [7]. But in limited precision networks with fixed point representation, the inputs, weights and neurons states are represented by a finite number of bits in fixed point format. Thus, during training, the weights are constrained within a limited range and the gains have to be set appropriately. ....
J. K. Kruschke and J. R. Movellan. Benefits of gain: Speeded learning and minimal hidden layers in back-propagation networks. IEEE Transactions on Systems, Man and Cybernetics, 21(1), 1991.
....E = m X p=1 n k X i=1 2 ip : In the batched variant of the SBP the weight update of w l ij in the s th learning step is given by: Deltaw l ij (s) j E w l ij where j is a parameter called learning rate. Many methods of speeding up the SBP algorithm have been proposed [18, 21, 22, 6, 8, 11, 16]. One of the fastest variations of the SBP algorithm is Rprop [18, 21, 8] Rprop stands for Resilient backpropagation . It is a local adaptive learning scheme, performing supervised batch learning in multilayer perceptrons. For a detailed discussion see also [18, 19] Rprop is considerably faster ....
John K. Kruschke and Javier R. Movellan. Benefits of gain: Speeded learning and minimal hidden layers in back propagation networks. IEEE Transactions on Systems, Man and Cybernetics., 21(1), 1991.
....weight changes. This is known as the flat spot problem. 5. The convergence to a local minimum of the error function can occur under certain circumstances and there is no proof that the SBP algorithm finds a global minimum. Several methods of speeding up the SBP algorithm have been proposed [17, 30, 28, 31, 16, 27, 36]. For example the momentum method modifies the SBP algorithm as follows: Deltaw l ij (s) j E w l ij (s) ff Deltaw l ij (s Gamma 1) where ff is a parameter called momentum. The momentum determines the effect of the previous Deltaw l ij value on the current one. This helps the ....
K. Kruschke John and R. Javier Movellan. Benefits of gain: Speeded learning and minimal hidden layers in back propagation networks. IEEE Transactions on Systems, Man and Cybernetics., 21(1), 1991.
.... of gain one (solid lines) and their four times steeper counterparts (dotted lines) Several other authors hypothesized about the existence of a relationship between the gain of the activation function and the weights [Codrington 94] Wessels 92] 3 , or between the gain and the learning rate [Kruschke 91] [Mundie 92] Zurada 92] In specific, J. Zurada writes: leads to the conclusion that using activation functions with large f=gain (the authors)g may yield results similar as in the case of large learning constant j. It thus seems advisable to keep at a standard value of 1, and to control ....
.... ;j ) Weight discretization with multiple thresholding of the real valued weights [Fiesler 94.2] The thresholds need adaptation. If d and d are the discretization functions applied on the weights, theorem 1 holds if 8x : fi ;j d(x) d(fi ;j x) Adaptive gain [Plaut 86] Bachmann 90] [Kruschke 91] [Sperduti 93] A change Deltafi ;j of the gain can be expressed as a change of the learning rates from fi 2 ;j j ;j to (fi ;j Deltafi ;j ) 2 j ;j , and of the weights from fi ;j w ;j to (fi ;j Deltafi ;j )w ;j , without changing the gain of the activation functions. ....
J. K. Kruschke and J. R. Movellan. Benefits of Gain: Speeded Learning and Minimal Hidden Layers in Backpropagation Networks. IEEE Transactions on Systems, Man and Cybernetics, volume 21, number 1, pages 273--280, January 1991.
....used for this interpretation. We believe use of hyperplane mass has, at least not explicitly, been employed before in any description of back propagation. Normally it is mentioned that weights just get ever larger without really understanding how important this is. We are aware of one technique [9] that makes use of something similar to our mass, namely the sigmoid gain parameter fi mentioned in section 2. The interpretation was used to explain how local minima can occur, and how they may be avoided, either through correct weight initializations, or greater understanding about what larger ....
J. K. Kruschke and J. R. Movellan. Benefits of gain: Speeded learning and minimal hidden layers in back-propagation networks. IEEE Transactions on Systems, Man and Cybernetics, 21(1), 1991.
....of the mapping represented by the network. This has guided our interest to the examination of the evolution of these gain values during the learning process. The results will be presented in the following sections. The role of the gain has been subject to previous work with varying intentions. In [4] the authors report on the benefit of manipulating the gain of the neurons explicitely to improve the speed of the learning process. The work described in [8] shows the interchangeability of the gain and a locally defined learning rate. In opposite to this, our work aims at the exploration of the ....
....f1; Jg (1) In the following we will refer to the gain values g j as the gain spectrum. The manipulation of the gain spectrum and the distances t j are handled by the standard learning algorithm implicitly. There are some algorithms that perform an explicit manipulation of these parameters [4], but our aim here is different. We employ the standard learning algorithm to train our network, monitor the gain values and try to draw information concerning our data from their evolution in addition to studying the learning process itself. We use the learning process to examine the complexity ....
J. K. Kruschke and J. R. Movellan. Benefits of Gain: Speeded learning and minimal hidden layers in back-propagation networks. IEEE Transactions on Systems, Man and Cybernetics, Vol. 1, 1991.
....this interpretation. We believe the idea of a hyperplane mass has not explicitly been employed before in any description of back propagation. Normally it is mentioned that weights just get ever larger without really understanding how important this is. We are aware of only one other technique [7] that indirectly makes use of a quantity similar to mass. The interpretation was used to explain pictorially the learning dynamics of backpropagation with hidden layers, and how the information is stored in the network in the form of hyperplane mass. The importance of weight initialization was ....
J. K. Kruschke and J. R. Movellan. Benefits of gain: Speeded learning and minimal hidden layers in back-propagation networks. IEEE Transactions on Systems, Man and Cybernetics, 21(1), 1991. Hyperplane Dynamics 27
....is out of the present article scope and only a brief description will be given here. The underlying concept is to efficiently train the internal slopes fl of the neurons by also computing a gradient descent with respect to them. It was first introduced in (Movellan, 1987) and further developed in (Kruschke and Movellan, 1991). In these works, the slopes are considered as another weights and are trained at the same time. It means that both are optimized together and can not be separated. Since NN1 is to be used on line, we can not have access to all the data and an optimized slope structure regardless of the weight ....
Kruschke, J. K. and Movellan, J. R. (1991). Benefits of gain: Speeded learning and minimal hidden layers in back-propagation networks. IEEE Transactions on Sysems, Man, and Cybernetics, 21(1):273--280.
No context found.
J. Kruschke and J. Movellan, "Benefits of gain: Speeded learning and minimal hidden layers in back propagation networks," IEEE Transactions on Systems, Man and Cybernetics., vol. 21, no. 1, pp. 273--280, 1991.
No context found.
J.R. Movellan J.K. Kruschke, "Benefits of the gain: Speeded learning and minimal hidden layers in back propagation networks," IEEE Trans. on Systems, Man and Cybernetics, vol. 21, no. 1, pp. 273--280, Jan. 1991.
No context found.
John K. Kruschke and Javier R. Movellan. Benefits of gain: Speeded learning and minimal hidden layers in back-propagation networks. IEEE Trans. Systems, Man and Cybernetics, 21(1):273--280, January 1991.
No context found.
Kruschke, J.K., & J. Rodriquez-Movellan (1991). Benefits of gains: speeded learning and minimal hidden layers in back-propagation networks. IEEE Transactions on Systems, Man, and Cybernetics, 21, 273-280.
No context found.
J. K. Kruschke and J. R. Movellan, "Benefits of Gain: Speeded Learning and Minimal Hidden Layers in Back-Propagation Networks," IEEE Transactions on Systems, Man, and Cybernetics, 21(1), 273--280 (January/February 1991).
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC