162 citations found. Retrieving documents...
A. Jacobs R. Increased Rates of Convergence Through Learning Rate Adaptation. Technical Report: UM-CS-1987-117. University of Massachusetts, Amherst, MA, 1987.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Genetic Programming Discovers Efficient Learning Rules for the.. - Radi, Poli (1999)   (Correct)

....and the error 1 Sip (s) and r is a parameter called learning rate. Z.2 Improvements to SBP Many methods have been proposed to improve generalisation performance and con vergence time of BP. Current research mostly concentrates on: the optimum setting of learning rates and momentum [5, 9, 18, 40, 45, 50, 51, 52]; the optimum setting of the initial weights [6, 24, 53] the enhancement of the contrast in the input patterns [23, 28, 1, 48, 57] changing he error function [1, 9, 17, 2, 44, 46] finding optimum architectures using pruning echniques [7, 15] In he following we will describe wo speed up methods ....

....learning rae coefficien implicitly by adding o he weigh change a fraction of he las weigh change as follows: where p is a parameter called momentum. This method decreases he oscillation which may occur wih large learning raes and accelerates he convergence. For a more derailed discussion see [9, 18, 50]. Rprop is one ofhe fases variations ofhe SBPalgorihm [40, 4, 56] Rprop stands for Resilien backpropagation . I is a local adaptive learning scheme, peorming supervised batch learning. The basic principle of Rprop is o eliminate he haful OE influence of he magnitude of he partial derivative o ....

Robert A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1:295-307, 1988.


Evolving Artificial Neural Networks - Yao (1999)   (66 citations)  (Correct)

....evolution could be considered as the first attempt of the evolution of learning rules [32] 152] 272] Harp et al. 152] encoded BP s parameters in chromosomes together with ANN s architecture. This evolutionary approach is different from the nonevolutionary one such as offered by Jacobs [273] because the simultaneous evolution of both algorithmic parameters and architectures facilitates exploration of interactions between the learning algorithm and architectures such that a near optimal combination of BP with an architecture can be found. Other researchers [32] 139] 213] 272] ....

R. A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, no. 3, pp. 295--307, 1988.


Hybrid Computational Intelligence Schemes in Complex.. - Tsakonas, Dounias (2002)   (Correct)

....basis functions. Neural networks controlled by fuzzy logic. Some basic theoretical aspects, detailed description of the characteristics of the methodological components, as well as the early adoptions of neural networks controlled by fuzzy logic, can be found in a series of publications [15] [16], 17] 18] 19] 20] 21] and [22] In [23] is addressed the concept of a fuzzy neural network to implement syllogistic fuzzy reasoning. In syllogistic fuzzy reasoning, the consequence of a rule in one reasoning stage is passed to the next stage as a fact. The approach is shown to be ....

Jacobs R.A. Increased rates of convergence through learning rate adaptation, Neural Networks Vol.1, 295-307, 1988


Deterministic Nonmonotone Strategies for Effective.. - Plagianakos..   (Correct)

....algorithm and avoid oscillations in a steep direction of the error surface. However, it is well known that this approach tends to be inefficient. For example, this happens when the search space contains long ravines that are characterized by sharp curvature across them and a gently slopping floor [28], 62] Moreover, this approach introduces difficulties in obtaining convergence of BP training algorithms [33] 38] Nevertheless, there are theoretical results that guarantee the convergence of batch BP algorithms for a constant learning rate. In this case, the learning rate should be ....

.... keep gradient direction fairly constant, or rapidly decrease it, if the direction of the gradient varies greatly at each epoch [11] 3) For each weight, an individual learning rate is given, which increases if the successive changes in the weights are in the same direction and decreases otherwise [28], 54] 60] 63] 4) Use a closed formula to calculate a common learning rate for all the weights at each iteration [27] 42] 56] or a different learning rate for each weight [15] 43] Note that all the above mentioned strategies employ heuristic parameters in an attempt to enforce the ....

[Article contains additional citation context not shown here]

R. A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, pp. 295--307, 1988.


Globally Convergent Algorithms With Local Learning Rates - Magoulas, Plagianakos..   (Correct)

.... constant, or rapidly decrease it if the direction of the gradient varies greatly [4] 3) use a local learning rate for each weight w i 2 (i = 1; 2; n) i.e. j 1 ;j n , which increases if the successive corrections of the weights are in the same direction and decreases otherwise [8], 19] 23] 27] This paper focuses on the last approach and particularly on the special class of first order adaptive training algorithms that employ local learning rates. These algorithms employ heuristic strategies to adapt the learning rates at each iteration and require fine tuning ....

....as well as to exploit the parallelism inherent in the evaluation of the error, E(w) and its gradient, rE(w) by the backpropagation (BP) algorithm, consists of using a different adaptive learning rate for each direction in weight space. Batch type BP training algorithms of this class [6] [8], 19] 23] 27] follow the iterative scheme 0 diagfj ) 1) and try to decrease the error by searching a local minimum with small weight steps. These steps are usually constrained by problem dependent heuristic parameters in order to avoid oscillations and ensure subminimization of ....

[Article contains additional citation context not shown here]

R. A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, pp. 295--307, 1988.


Linear array architecture implementing the.. - Marchesi, Orlandi.. (1990)   (1 citation)  (Correct)

....of implementing more than a neuron in a single PE. This allows to share the resources and to optimize the silicon area occupancy. Therefore the architecture can be implemented as mixed grain computing structure. Moreover more complex learning algorithms (momentum, the delta delta learning rule [5],etc. can be easily implemented. PROCEDURE ForwardMode (VAR PE:Neuron) wait for a valid input forward mode operations IF Right.Valid3=on THEN BEGIN BEGIN WITH PE DO State: State 1; BEGIN WaitStates: Positlon; IF LeftActivated THEN END; BEGIN IF State= i THEN signal busy state wait ....

R.A. Jacobs, "Increased Rates of Convergence Through Learning Rate Adaptation", Neural Networks, Vol. 1, No. 4, 1988, pp.295-307.


Modelling Chaotic Systems with Neural Networks: Application to.. - van Zyl   (Correct)

....information (conjugate gradient, Quasi Newton, second order calculation of the step size) stochastic optimisation, and heuristics utilising the sign of the local gradient, the angle between gradient direction, or peak learning rate values. We discuss the delta bar delta rule by Jacobs [28], and refer the reader to the literature for a survey of some of the other common techniques employed [45] The delta bar delta adaptive learning rate technique defines a separate learning rate for each weight. It uses an estimation of the slope of the local error function to adjust the learning ....

R. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, pp. 295--307, 1988.


A Spectral Version Of Perry's Conjugate Gradient.. - Sotiropoulos.. (2002)   (Correct)

....have been tested, are the eXclusive OR Problem, the 3 bit Parity Problem, the Font Learning Problem and the Continuous Function Approximation Problem. On each problem, four algorithms have been simulated. These algorithms are the standard backpropagation (BP) the momentum backpropagation (MBP) [3], the adaptive backpropagation (ABP) 12] and the NSCGBP. For the simulations an IBM PC compatible with Matlab Version 5.3 has been used. We have utilized Matlab Neural Network Toolbox Version 3.0 for the BP, MBP and ABP algorithms. For the heuristic parameters of the previous three algorithms, ....

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, 1, 295--307, (1988).


Classification Using Hierarchical Mixtures Of Experts - Waterhouse, Robinson (1994)   (20 citations)  (Correct)

....with a total of 9 parameters. By way of comparison, conventional feed forward networks, using a 2 2 1 structure can solve this problem at the 0. 6 threshold in an average of 19 epochs of quickprop [7] Using the delta bar delta rule, an average training time of 250.4 epochs has been reported [9]. Using the HME, we reach the 0. 6 threshold in an average of 2. 76 epochs, and the 0. 99 threshold in an average of 11. 5 epochs. Of all these trials of the HME on the XOR problem, none failed to converge or had to be restarted. The results on the N input Parity problem are shown in Table 2. By ....

R. A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, pp. 295--307, 1988.


Improving the Rprop Learning Algorithm - Igel, Hüsken (2000)   (15 citations)  (Correct)

....as ( 1) as ( 1) Aij , else , o 0) where 0 r 1 r . If the partial derivative Ot Owij possesses the same sign for consecu tive steps, the step size is increased, whereas if it changes sign, the step size is decreased (the same principle is also used in other learning methods, e.g. [5, 18]) The step sizes are bounded by the param eters Amin and A . In the following, we describe the second part of the algorithm, the update of the weights, for the different Rprop versions. 2.1 Rprop with Weight Backtracking After adjusting the step sizes, the weight updates Awij are determined. ....

R. A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1(4):295 307, 1988.


First and Second-Order Methods for Learning: between Steepest.. - Battiti (1992)   (88 citations)  (Correct)

....Another diff culty related to the use of a momentum term is due to the fact that there is an upper bound on the adjustment caused by the momentum. For example, if all partial derivatives are equal to 1, then the exponentially weighted sum caused by the momentum rate at converges to 1 (1 at) see [Jacobs 1988] for details s. Furthermore, summing the momentum term to the one proportional to the negative gradient may produce an ascent direction, so that the error increases after the weight update. Among the researchers using conjugate gradient methods for the MLP are [Barnard and Cole 1988] Johansson ....

....function and have been used for problems with up to thousands of variables 6. Additional references are [Gawthrop and Sbarbaro 1990] and [Kollias and Anastassiou 1989] In [Kollias and Anastassiou 1989] the Levenberg Marquardt technique is combined with the acceleration techniques described in [Jacobs 1988] and [Silva and Almeida 1990] 28I owe this information to prof. Christopher G. Atkeson. MINPACK is described in [Mort et al. 1980] NL2SOL in [Dennis et al. 1981] 26In reality the test problems presented in this paper have a special partially separable structure, so that their practical ....

[Article contains additional citation context not shown here]

Jacobs, R.A. Increased rates of convergence through learning rate adaptation. Neural Networks 1, 295-307.


A Nonmonotone Backpropagation Training Method for.. - Plagianakos.. (1998)   (Correct)

....has been developed to study the performance of the learning algorithms. The simulations have been carried out on a Pentium 133MHz PC IBM compatible using MATLAB version 5.01. The performance of the NMBP algorithm has been evaluated and compared with the batch versions of BP, momentum BP (MBP) [5], and adaptive BP (ABP) 12] from Matlab Neural Network Toolbox version 2.0.4. Toolbox default values for the heuristic parameters, of the above algorithms, are used unless stated otherwise. For the NMBP algorithm, we have fixed the values of M = 10 and # = 10 5 . Especially for the XOR problem ....

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, 1, 295--307, (1988).


Classification and Regression using Mixtures of Experts - Waterhouse (1997)   (7 citations)  (Correct)

....that the parameters of adjacent hidden layers are tied. Optimisation may then be carried out using any standard gradient based scheme. The actual optimisation scheme used in the RNN of ABBOT is similar to the RProp algorithm of Schiffmann et al. 211] and delta bar delta algorithm of Jacobs [97] but was developed independently by 11.4. SPEAKER ADAPTATION WITHIN ABBOT 144 Robinson [198] For a parameter vector the algorithm takes the following form. The objective function is the cross entropy between the targets and the predictions of the network. For each iteration of the algorithm a ....

Jacobs, R. A. [1988], `Increased rates of convergence through learning rate adaptation', Neural


An Improved Backpropagation Method with Adaptive.. - Plagianakos.. (1998)   (Correct)

....has been developed to study the properties of the learning algorithms. The simulations have been carried out on a Pentium 133MHz PC IBM compatible using MATLAB version 5.01. The performance of the BBP algorithm has been evaluated and compared with the batch versions of BP, momentum BP (MBP) [3], and adaptive BP (ABP) 7] from Matlab Neural Network Toolbox version 2.0.4. Toolbox default values for the heuristic parameters of the above algorithms are used, unless stated otherwise. For the BBP algorithm, maximum growth factor value is fixed to = 2. The algorithms were tested using the ....

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, 1, 295--307, (1988).


Neural receding horizon control with applications to a 6 DoF.. - Wan, Bogdanov (2001)   (Correct)

....output, which in turn gives more authority to the neural controller. The training process is repeated at the update interval to recompute new weights w for the next horizon. 1 In practice we use an adaptive learning rate for each weight in the network using a procedure similar to delta bar delta [5] 5 4. Application to Helicopter Control We apply the RH control approach to a nonlinear 6 DoF model of a helicopter. The model includes rigid body dynamics supplemented with forces and moments induced by the main and tail rotors. The 12 dimensional state vector consists of Cartesian coordinates ....

R. A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, Vol. 1, 1988, pp. 295307.


Teaching a Robot How to Read Symbols - Michaud, Létourneau (2001)   (Correct)

....to compensate for any rotation (skew) of the character is made by the algorithm. However, images in the training set sometimes contain small angles depending on the angle of view of the camera in relation to the perceived symbol. Training of the neural networks was done using delta bardelta [6], which adapts the learning rate of the backpropagation learning law. The activation function used is the hyperbolic tangent, with activation values between 1 and 1. As indicated previously, the input layer of these neural networks is made of 117 neurons, one for each element of the 9 # 13 ....

R. A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1:295-307, 1988.


Classification Using Hierarchical Mixtures Of Experts - Waterhouse Robinson Cambridge (1994)   (20 citations)  (Correct)

....with a total of 9 parameters. By way of comparison, conventional feed forward networks, using a 2 2 1 structure can solve this problem at the 0. 6 threshold in an average of 19 epochs of quickprop [7] Using the delta bar delta rule, an average training time of 250.4 epochs has been reported [9]. Using the HME, we reach the 0. 6 threshold in an average of 2. 76 epochs, and the 0. 99 threshold in an average of 11. 5 epochs. Of all these trials of the HME on the XOR problem, none failed to converge or had to be restarted. The results on the N input Parity problem are shown in Table 2. By ....

R. A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, pp. 295--307, 1988.


PVM-based Training of Large Neural Architectures - Plagianakos, Magoulas..   (Correct)

.... frames of the same sequence, and used for training in parallel the network to discriminate between malignant and normal regions (see [6] for further technical details) Using the proposed parallel methodology, the following batch training algorithms have been tested: the BP, the momentum BP (MBP) [5], the adaptive BP (ABP) 13] the Non Monotone BackPropagation with Variable Stepsize (NMBPVS) 8, 11] and the Non Monotone Barzilai Borwein backPropagation NMBBP [11] algorithms. The algorithms have been tested using the same initial weights, initialized by the Nguyen Widrow method [9] and ....

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, 1, 295--307, (1988).


Development and Convergence Analysis of Training.. - Magoulas..   (Correct)

....and to av oid oscillations in a direction where the error function is steep. It is well known that this approach tends to be ine#cient. This happens, for example, when the search space contains long rav ines that are characterized by sharpcurv ature across them and a gently slopping floor [8, 20]. Obtaining e#cient conv ergence of BP training algorithms utilizing a constant learning rate is considered particularly di#cult [9, 10] On the other hand, there are theoretical results that guarantee the conv ergence when the learning rate is constant. In this case the learning rate is ....

....exploit the parallelism inherent in theev aluation of E(w)and#E(w) by the BP algorithm, consists of using a di#erent learning rate for each direction in weight space. Various batch type BP training algorithms with an adaptiv e learning rate for each weight hav e been suggested in the literature [5, 8, 16, 18, 21]. Following this approach equation (2) is reformulated to the following scheme: w k 1 = w k diag # k 1 , # k n #E(w k ) 3) The algorithms that follow the abov e scheme try to decrease the error by searching a local minimum with small weight steps. These steps are usually ....

[Article contains additional citation context not shown here]

R.A. Jacobs, Increased rates of conv ergence through learning rate adaptation, Neural Networks, 1, 295--307, 1988.


Nonmonotone Methods for Backpropagation Training with.. - Plagianako, Vrahatis   (Correct)

....avoid converO320 to a saddle point or a maximum. In pr]#P08E a small constant learP#8 r ate is chosen (0 # 1) inor0P to secur the conver6[#3 of the BP tr]32P# algor#Eq and to avoid oscillations in adir3]PEq wher theer02 function is steep. It is well known that this appr]3 h tends to be ine#cient [12, 28]. For di#culties in obtaining conver83PO of BPtr]2O38 algor8Eq2 utilizing a constant learP23 r ate see [13, 16] On theother hand, ther ar theorEq22# rheor thatguar] tee the conver8[3[ when the lear[O6 r ate is constant. In this case the lear8]8 r ate ispr] or] #32 to the inver2 of the Lipschitz ....

.... epochs keep grE3[O t dir[3OEq fair3 constant,or r22]3 decr E2 it, if the dir23]PE of the gr[O30 t var83 gr2#]O at each epoch [6] and (iii)for each weight an individual learidu rar is given, which incr[O#E if the successive changes in the weights ar in the samedirE2238 and decr2OP[ otherOP[ [12, 22, 26, 29]. Note that all the above mentioned strdE[23[ employ heur]PEq par]P ter in an attempt toenfor3 the monotone decrnti of the lear20E er[2 and tosecur the conver] of the treE2OP algorP[OE A di#er8 t appr]O h is based on Goldstein s and ArE2OP2 wor on steepest descent andgr2228 t methods. The method ....

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, 1, 295--307, (1988).


A Framework for the Development of Globally.. - Magoulas..   (Correct)

.... gradient direction fairly constant, or rapidly decrease it if the direction of the gradient varies greatly at each epoch [4] or (iii) for each weight an individual learning rate is given, which increases if the successive changes in the weights are in the same direction and decreases otherwise [10, 19, 21, 24]. Note that all the above mentioned strategies employ heuristic parameters in an attempt to enforce the monotone decrease of the learning error and to secure the converge of the training algorithm to a minimizer of E. A di#erent approach is based on Goldstein s and Armijo s work on ....

....derivative of E(w) with respect to the ith weight and or, depending on the algorithm, the history of the corresponding learning rate. Appropriate values of the heuristics ensure that the error function is decreased in each weight direction, every epoch. The well known delta bar delta method [10] and Silva and Almeida s method [24] follow this approach. Another method, named quickprop [6] is based on independent secant steps in the direction of each weight. The Rprop algorithm [21] updates the weights using the learning rate and the sign of the partial derivative of the error function ....

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, Vol.1, 1988, pp. 295--307.


Recognition of Consonant-Vowel (CV) Utterances Using Modular Neural .. - Rao (2000)   (Correct)

....training set that includes a large number of negative examples compared to number of positive examples. Studies to improve the performance of the standard backpropagation algorithm for fast convergence use adaptive methods that vary the learning rate and other parameters during training [44]. An analysis of slow convergence of backpropagation method due to imbalanced training set, and a modi ed version of the standard backpropagation algorithm has been presented in [45] Speech recognition system for isolated Cantonese tonal syllables based on One Class One Network has been ....

R. A. Jacobs, \Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, pp. 295-307, 1988.


Using Neural Networks To Position Live Loads On Bridge Piers - Williams (2000)   (Correct)

....important, the choice of a large step size may fail to locate a minimum by skipping over any drops in the error surface. Adaptive learning seeks to rectify both problems by modifying the step size during the optimization process. The most prominent form of adaptive learning was proposed by Jacobs (1988). The original form of the method, known as the delta delta rule, introduced separate learning rates for each weight in the network. This learning rate acts as a multiplier in front of the weight changes. Therefore, a small factor would slow down the weight adjustment process and thereby slow down ....

Jacobs, R. (1988). Increased Rates of Convergence through Learning Rate Adaptation.


A System for Support of Humans with Damaged Sight.. - Josifovski.. (1996)   (Correct)

....is in (0,1) range, 0 meaning absence, and 1 presence of syllable break. Figure 3 shows an example of how the probability of syllable break existence is determined for the second letter ( a ) in the word baba . Classic back propagation (BP) 24] of error, and modified BP with variable step length [25], are used in the process of training. We considered other improvements to speed up the training process [26] The usage of tanh instead sigmoid transfer function significantly increased the learning speed. However, the learning process failed occasionally. Weights are updated after the ....

Jacobs A. R, "Increased Rates of Convergence Through Learning Rate Adaptation", Neural Networks, Vol. 1, pp 295-307, 1988


Fuzzy Inference System Learning by Reinforcement Methods - Jouffe (1997)   (2 citations)  (Correct)

....the same sign for several consecutive time steps, its learning rate should be increased, and 4. when the parameter derivative sign alternates for several consecutive time steps, its learning rate should be decreased. To implement these heuristics we use the Delta Bar Delta learning rule [48] in which the learning rate update rule is defined as follows: Deltafi i t = 8 : if ffi i t Gamma1 ffi i t 0; GammaOEfi i t if ffi i t Gamma1 ffi i t 0; 0 otherwise; 14) where in our case: ffl fi i t is the critic learning rate of rule R i at time step t, ....

R.A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, pp. 295--307, 1988.


Successes And Failures Of Backpropagation: A Theoretical.. - Frasconi, Gori, Tesi   (3 citations)  (Correct)

....whole learning environment. When using a gradient descent learning scheme Rumelhart et al. 44] p. 330) have suggested the adoption of the momentum term which acts like a low pass lter for smoothing the weight changes. Much attention has been paid to the adaptive selection of the learning rate [32] and also to second order methods based on di erent hessian approximations. A problem discovered with second order methods is that they do not seem to scale up very well with the dimension of the problem [55, 39] Using an approach more heuristic than formal, Fahlman [15] has developed an ....

R. A. Jacobs, \Increased rates of convergence through learning rate adaptation", Neural Networks, Vol. 1, 1988, pp. 295-307.


On the Convergence of the LMS Algorithm with Adaptive Learning Rate .. - Luo (1990)   (19 citations)  (Correct)

....stepsize is kept 3 fixed. We also give a general condition for the learning rates under which the LMS algorithm is guaranteed to converge to the optimal weight matrix BA . The effect of adaptively altering the learning rate in each training cycle has also been the subject of study by Jacobs [Jac88]. In particular, Jacobs has proposed a heuristic scheme in which if the current gradient is in the same direction as the previous gradient, the learning rate increases linearly with time, while if the gradient has changed directions, the learning rate decays exponentially. It has been observed ....

Jacobs, R. A., "Increased Rates of Convergence Through Learning Rate Adaptation", Neural Networks, Vol. 1, 1988, pp. 295--307.


Highly Irregular Elliptical Object Localisation In.. - Carvalho, Costa..   (Correct)

....minima, it is also true that the additional weights can create new channels, through which the gradient can descent and reach the global minimum. In order to reduce the probability of getting stuck in a local minimum the network was tuned using the momentum with an adaptive learning rate algorithm [11], to speed up the training process. The used training set was composed by 841 pairs of EAS (Q=841) from artificial and real elliptical objects. It was considered, for training purposes, that AGC EAS EAS same ellipse EAS EAS same ellipse i q k j i k q k l i k j q k l , ....

R. A. Jacobs, "Increased rates of convergence through learning rate adaptation", Neural Networks, 1, 295-307, 1988.


A Successive Overrelaxation Back Propagation.. - De Leone.. (1999)   (Correct)

....1:0] interval. In Table I we present the result for the classic XOR problem. The stopping criterion is the 0:4 Gamma 0:6 threshold. The maximum number of epochs is fixed to 400. The XOR problem with the same topology used here has been studied by different authors. For the standard Jacobs method [19] convergence requires an average of 530 epochs. For the Jacobs method with delta bar delta rule [19] 250 epochs are required, in the average. Rumelhart, Hinton and Williams [19] report that Y. Chauvin found that the average number of epochs with j = 0:25 was 245. Fahlman with the more ....

....the 0:4 Gamma 0:6 threshold. The maximum number of epochs is fixed to 400. The XOR problem with the same topology used here has been studied by different authors. For the standard Jacobs method [19] convergence requires an average of 530 epochs. For the Jacobs method with delta bar delta rule [19], 250 epochs are required, in the average. Rumelhart, Hinton and Williams [19] report that Y. Chauvin found that the average number of epochs with j = 0:25 was 245. Fahlman with the more sophisticated Quick Pro algorithm [20] achieves convergence in of 24 epochs on the average. In Table II ....

[Article contains additional citation context not shown here]

R. A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, pp. 295--307, 1988.


A New Higher-order Binary-input Neural Unit: Learning and.. - Sahin (1994)   (Correct)

.... Gammabj if DeltaE 0 0 otherwise (4.6) where a and b are appropriate constants. The meaning of consistently is based on last K steps. In the simulations, a, b and K are set as a = 0:05, b = 0:3 and K = 10. Such an automatic adjustment of j has been proposed by various authors, e.g. [17] [44] for the backpropagation algorithm, which we have adopted here. The value of OE( X) is unbounded which may lead blow ups during the learning process. We have, however, avoided this by imposing a constraint as jH j k j 1 and jM j k j 1, during the simulations. It is important ....

R.A. Jacobs, "Increased Rates of Convergence Through Learning Rate Adaptation", Neural Networks, 1, 295, (1988).


Inferring Motor Programs from Images of - Handwritten Digits Geoffrey   (Correct)

No context found.

A. Jacobs R. Increased Rates of Convergence Through Learning Rate Adaptation. Technical Report: UM-CS-1987-117. University of Massachusetts, Amherst, MA, 1987.


Journal of Machine Learning Research 7 (2006) 1159--1182.. - Based On Sensitivity   (Correct)

No context found.

R. A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1 (4):295--308, 1988.


Fuzzy Sets and Systems 157 (2006) 1851 -- 1863 - Www Elsevier Com   (Correct)

No context found.

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Network 1 (1988) 295--307.


H. Yamaguchi, "Efficient encoding of colored pictures in.. - Vector Quantization Of (1999)   (Correct)

No context found.

R. A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, pp. 295--307, 1988.


A General Feed-Forward Algorithm for Gradient Descent in.. - Thrun, Smieja (1990)   (2 citations)  (Correct)

No context found.

R. A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1:295--307, 1988.


Discovering Efficient Learning Rules for Feedforward Neural.. - Radi, Poli (2002)   (Correct)

No context found.

R. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural Networks, vol. 1, no. 1, pp. 295--307, 1989.


High Classification Accuracy Does Not Imply Effective Genetic.. - Kovacs, Kerber   (Correct)

No context found.

R. A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1:295-307, 1988.


Rprop - Description and Implementation Details - Riedmiller (1994)   (5 citations)  (Correct)

No context found.

R. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1(4), 1988.


A Class Of Gradient Unconstrained Minimization.. - Vrahatis.. (2000)   (1 citation)  (Correct)

No context found.

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks 1 (1988) 295--307.


High Classification Accuracy Does Not Imply Effective Genetic.. - Kovacs, Kerber   (Correct)

No context found.

R. A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1:295#307, 1988.


A new efficient variable learning rate for Perry's spectral .. - Kostopoulos, Grapsa (2004)   (Correct)

No context found.

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, 1, 295--307, (1988).


Neural Networks and Evolutionary Computation. Part I: Hybrid.. - Weiß   (Correct)

No context found.

Jacobs, R.A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295--307.


Online Independent Component Analysis with Local.. - Schraudolph.. (2000)   (Correct)

No context found.

R. Jacobs, \Increased rates of convergence through learning rate adaptation", Neural Networks, 1:295-307, 1988.


Local Gain Adaptation in Stochastic Gradient Descent - Schraudolph (1999)   (5 citations)  (Correct)

No context found.

R. Jacobs, \Increased rates of convergence through learning rate adaptation", Neural Networks, 1:295-307, 1988.


Optimizing the Structure of Radial Basis Function Networks by.. - Wienholt (1993)   (4 citations)  (Correct)

No context found.

Robert A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1:295 -- 307, 1988.


Empirical Evaluation of the Improved Rprop Learning Algorithms - Igel, Hüsken (2003)   (2 citations)  (Correct)

No context found.

R. A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1(4):295--307, 1988.


Studies of Model Selection and Regularization for Generalization in .. - Guo   (Correct)

No context found.

R. A. Jacobs, "Increased Rates of Convergence through Learning Rate Adaptation," Neural Networks, vol. 1, pp. 295--307, 1988.


Digital Signal Processing 11, 204--221 (2001) - Doi Dspr Available (2001)   (Correct)

No context found.

Jacobs, R. A., Increased rates of convergence through learning rate adaptation. Neural Networks 1 (1988), 295--307.


Fuzzy Parameter Adaptation in Optimization: Some.. - Arabshahi, Choi.. (1996)   (1 citation)  (Correct)

No context found.

R.A. Jacobs, "Increased Rates of Convergence Through Learning Rate Adaptation," Neural 25,t'wo'ks, Vol. 1, 1988, pp. 295-307.


Surprising Predictions - Draft Do Not   (Correct)

No context found.

Jacobs, RA (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295-307.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC