| Johansson, ?., Dowla, F. and Goodman, ?. (1990). Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method. UCRL-JC-104850, Lawrence Livermore National Laboratory. |
....an error function (often a mean square error) of ANN s. The so called learning problem here is a typical optimization problem in numerical analysis. Many improvements on the ANN learning algorithm are actually improvements over optimization algorithms [12] such as conjugate gradient methods [13], 14] Learning is different from optimization because we want the learned system to have best generalization, which is different from minimizing an error function. The ANN with the minimum error does not necessarily mean that it has best generalization unless there is an equivalence between ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman, "Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method," Int. J. Neural Syst., vol. 2, no. 4, pp. 291--301, 1991.
....this term does not need to be differentiable or even continuous. Weight sharing and weight decay can also be incorporated into the fitness function easily. Evolutionary training can be slow for some problems in comparison with fast variants of BP [131] and conjugate gradient algorithms [19] [132]. However, EA s are generally much less sensitive to initial conditions of training. They always search for a globally optimal solution, while a gradient descent algorithm can only find a local optimum in a neighborhood of the initial solution. For some problems, evolutionary training can be ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman, "Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method," Int. J. Neural Syst., vol. 2, no. 4, pp. 291--301, 1991.
....1988] for details s. Furthermore, summing the momentum term to the one proportional to the negative gradient may produce an ascent direction, so that the error increases after the weight update. Among the researchers using conjugate gradient methods for the MLP are [Barnard and Cole 1988] [Johansson et al. 1990], Bengio and Moore 1989] Drago and Ridella 1991] Hinton s group in Toronto, the groups at CMU, Bell Labs, etc. A version in which the one dimensional minimization is substituted by a scaling of the step that depends on success in error reduction and goodness of a one dimensional quadratic ....
Johansson, E.M., F.U. Dowla, D.M. Goodman. Backpropagation Learning for Multi-Layer Feed-Forward Neural Networks Using the Conjugate Gradient Method. Lawrence Livermore National Laboratory, Preprint UCRL-JC-104850.
....Unfortunately, each iteration requires a sizable number of function calls (about 30) and gradient calls (about 25) The two variants do not present statistically significant differences. Some references about the use of conjugate gradient for training in neural networks are, for example, [18], 21] and [2] 3.4 One Step Secant with Fast Line Search Computing the exact Hessian requires order O(N ) operations [8] and order O(N ) memory to store the Hessian components, in addition the solution of equation 13 to find the step (or search direction) in Newton s method requires ....
E. M. Johansson, F. U. Dowla and D. M. Goodman, Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method, Preprint UCRL-JC-104850, Lawrence Livermore Nat. Lab. (Sept. 1990).
....in Table 2. We assess the performance of a much wider range of algorithms than those tested in [6] namely the on line and o line versions of BP with momentum (BPM) resilient propagation (RPROP) 19] conjugate gradient methods of Fletcher Reeves (CG FR) and Polak Ribi ere (CG PR) with restarts [20], the quickprop algorithm of Fahlman (QP) 18] and the Delta Bar Delta algorithm of Jacobs (DBD) 17] For each benchmark problem we performed 50 learning trials starting from di erent randomly chosen weights in the range 0.5 to 0.5. The maximum number of epochs per trial was set to 1000 and ....
E. M. Johansson, F. U. Dowla and D. M. Goodman, Backpropagation learning for multilayer feedforward networks using the conjugate gradient method. International Journal of Neural Systems 2 (1992) 291-301.
....training, this term does not need to be differentiable or even continuous. Weight sharing and weight decay can also be incorporated into the fitness function easily. Evolutionary training can be slow for some problems in comparison with fast variants of BP [131] and conjugate gradient algorithms [19, 132]. However, EAs are generally much less sensitive to initial conditions of training. They always search for a globally optimal solution, while a gradient descent algorithm can only find a local optimum in a neighborhood of the initial solution. For some problems, evolutionary training can be ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman, "Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method," Int'l J. of Neural Systems, vol. 2, no. 4, pp. 291--301, 1991.
....convergence of the backpropagation algorithm is very slow, on the order of 1=t at best. Numerical optimization technique offers a rich and robust set of methods which can be applied in an attempt to improve learning rates [3] In particular, the conjugate gradient method can be easily implemented [8] and the convergence rate is comparable with computationally expensive second order methods [15] It is an extended version of the paper presented in the proceedings of ICANN 95 conference (see vol. 1, p. 311) held in Paris, October 9 13. The extension covers experimental results with Net ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman. Backpropagation learning for multilayer feed-forward neural networks using he conjugate gradient method. International Journal of Neural Systems, 2(4):291--301, 1992.
....to the training data. This has been shown to increase the generalization ability of the network [20] Networks of the MLP type are the most commonly used neural nets in use today, and they are usually trained using a backpropagation algorithm [21] A scaled conjugate gradient training method [22, 23, 24, 20] has been used in our research instead of the ubiquitous backpropagation method, training speed gains of an order of magnitude being typical. Neural nets of the RBF type get their name from the fact that they are built from radially symmetric Gaussian functions of the inputs. Actually, the RBF ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman. Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method. IEEE Transactions on Neural Networks, 1991.
....the characteristics of the extended Kalman filter methods and the EM algorithm are highlighted. 2.3. 1 Second order methods Research has shown that second order methods such as the conjugate gradient method can improve the learning speed of feedforward networks, see in par12 ticular the work of [18,19]. However, extending the same idea to the training of recurrent networks is not as straightforward as one might expect. This is because the conjugate gradient is inherently an off line method. Despite this limitation, conjugate gradient has been formulated as an epoch based training algorithm for ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman. Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. Journal of Neural Systems, 2(4):291--301, 1992.
....to the training data. This has been shown to increase the generalization ability of the network [20] Networks of the MLP type are the most commonly used neural nets in use today, and they are usually trained using a backpropagation algorithm [21] A scaled conjugate gradient training method [22, 23, 24, 20] has been used in our research instead of the ubiquitous backpropagation method, training speed gains of an order of magnitude being typical. Neural nets of the RBF type get their name from the fact that they are built from radially symmetric Gaussian functions of the inputs. Actually, the RBF ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman. Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method. IEEE Transactions on Neural Networks, 1991.
....errors are E = 0:110 and E = 0:100 for BP and LV respectively. b) Average convergence curves for the first 5,000 epochs. The MH learning rate is jM = 0:007. The third problem, parity [18] has been used in benchmark studies where Conjugate Gradient (CG) algorithms greatly outperformed BP [17]. I use 4, 5 and 6 dimensional parity problems, with architectures of n inputs, 8 hidden units and one output. Each network is trained for 10,000 epochs or until 100 correct classification. The weights are initialized with 2 [ Gamma0:1; 0:1] A large momentum term is used with ff = 0:9. The ....
E. Johansson, F. Dowla and D. Goodman, "Backpropagation Learning for Multilayer Feed-forward Neural Networks using the Conjugate Gradient Algorithm", Int. J. Neur. Syst. 2, 291 (1992)
....Unfortunately, each iteration requires a sizable number of function calls (about 30) and gradient calls (about 25) The two variants do not present statistically significant differences. Some references about the use of conjugate gradient for training in neural networks are, for example, [18], 21] and [2] 3.4 One Step Secant with Fast Line Search Computing the exact Hessian requires order O(N 2 ) operations [8] and order O(N 2 ) memory to store the Hessian components, in addition the solution of equation 13 to find the step (or search direction) in Newton s method requires O(N ....
E. M. Johansson, F. U. Dowla and D. M. Goodman, Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method, Preprint UCRL-JC-104850, Lawrence Livermore Nat. Lab. (Sept. 1990).
....data. This has been shown to increase the generalization ability of the network [27] Networks of the MLP type are the most commonly used neural nets in use today, and they are usually trained using a backpropagation algorithm [28] A scaled conjugate gradient training method instead [29, 30, 31, 27] has been preferred to the ubiquitous backpropagation method, speed gains of an order of magnitude being typical. Figure 7 shows MLP class regions resulting from varying the first two inputs to a trained 8 input, 48 hidden unit network. B s s c s B B Neural nets of the Radial Basis Functions ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman. Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method. IEEE Transactions on Neural Networks, 1991.
....to the training data. This has been shown to increase the generalization ability of the network [29] Networks of the MLP type are the most commonly used neural nets in use today, and they are usually trained using a backpropagation algorithm [45] A scaled conjugate gradient training method [46, 47, 28, 29] was used in our research instead of the ubiquitous backpropagation method, training speed gains of an order of magnitude being typical. Figure 5 shows MLP class regions resulting from varying the first two inputs to a trained 8 input, 48 hidden unit network. Figure 5: MLP classification and ....
E.M. Johansson, F.U. Dowla, and D.M. Goodman. Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. IEEE Transactions on Neural Networks. To be published.
....on the function and gradient values is obtained which is considered to be the standard conjugate gradient as applied to the training of feed forward neural networks. As it is only briefly outlined here, the derivation and formal description of the conjugate direction principles can be found in [Johansson 92] or [Moller 93] In this section, the vector containing the summation of the negative gradients vectors for all the pattern presentations in epoch k ( GammarE(w k ) will be denoted as g k . 1. Initialization. The weight vector w 0 is set. The initial direction d 0 is obtained by gradient descent ....
.... j k d k . 4. A new direction d k 1 is computed. If (k 1 mod W) 0 then the algorithm is restarted with d k 1 = g k 1 . Otherwise d k 1 = g k 1 fi k d k . 5. If the minimumwas reached then the search is terminated. Otherwise, a new iteration is performed: k = k 1 and jump to step 2. 4 See [Johansson 92] for details on its derivation. 5 The fi k parameter is always calculated in order to force the consecutive directions to be conjugate. Different formulas exist to calculate it, such as the following: Fletcher Reeves: fi k = g T k 1 g k 1 g T k g k Polak Ribi ere: fi k = g T k 1 [g k 1 ....
E.M. Johansson, F.U. Dowla, and D.M. Goodman. "Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method." International Journal of Neural Systems (ISSN: 0129-0657), volume 2, number 4, pages 291--301. World Scientific Publishing Company, 1992.
....version of this paper. A.1 Abstract A supervised learning algorithm (Scaled Conjugate Gradient, SCG) is introduced. The performance of SCG is benchmarked against that of the standard backpropagation algorithm (BP) Rumelhart et al. 86] the conjugate gradient algorithm with line search (CGL) Johansson et al. 91] and the one step Broyden Fletcher Goldfarb Shanno memoryless quasi Newton algorithm (BFGS) Battiti 89] SCG is fully automated, includes no user dependent parameters and avoids a time consuming line search, which CGL and BFGS use in each iteration in order to determine an appropriate step ....
.... methods, called the Conjugate Gradient Methods, are well suited to handle large scale problems in an effective way [Hestenes and Stiefel 52] Fletcher 75] Gill et al. 81] Powell 77] Several conjugate gradient algorithms have recently been introduced as learning algorithms in neural networks [Johansson et al. 91] Battiti 89] M ller 90b] Johansson, Dowla and Goodman describes the theory of general conjugate gradient methods and how to apply the methods in feed forward neural networks. They conclude that the standard conjugate gradient method with line search (CGL) is an order of magnitude faster than ....
[Article contains additional citation context not shown here]
E.M. Johansson, F.U. Dowla and D.M. Goodman (1991), Backpropagation Learning for Multi-Layer Feed-Forward Neural Networks Using the Conjugate Gradient Method, International Journal of Neural Systems, Vol. 2, No. 4, pp. 291-301.
....function to be optimized, they are typically referred to as second order methods. Over the years, a number of second order methods have been proposed [2] In particular, the conjugate gradient method is commonly used in training BP networks due to its speed and simplicity. It has been shown [4] [5], 6] 7] that training time can be significantly reduced when feedforward networks are trained by second order methods. Recurrent neural networks (RNNs) which include feedback loops (connections by which a node s prior output influences its subsequent output) are capable of processing ....
....( 1 ) w w (17) where e k (n) is the error at the kth output neuron for the nth pattern and ) n p ijk is computed as in (10) Since the CGRL algorithm requires both error function and gradient to be evaluated, the calculations should be performed together to maximize efficiency. Johansson et al. [5] proposed to use the conjugate gradient method to train backpropagation networks. We extend their idea to recurrent networks as follows. 1. A starting point, 0 w is selected by initializing the weights between 0.5 and 0.5 randomly. The gradient 0 g at this point is computed (as in (17) and an ....
[Article contains additional citation context not shown here]
E. M. Johansson, F. U. Dowla, and D. M. Goodman. Backpropagation Learning for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method. International Journal of Neural Systems, vol. 2, no. 4, 1992, p.291 - p.301.
....to neural networks instead. CG has been widely validated in the field of unconstrained nonlinear optimization [ Powell, 1977, Fletcher, 1980 ] Recent studies have shown that a variety of CG methods are much faster than back propagation learning ( Kramer and Sangiovanni Vincentelli, 1989, Johansson et al. 1991, Barnard, 1992 ] CG is also used on a regular basis in several research groups (cf. Barnard and Cole, 1989, Nowlan and Hinton, 1992 ] Since the goal of transfer is to speed up learning, it makes sense to base a transfer method on the fastest existing algorithm currently reliably used. ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman. Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method. International Journal of Neural Systems, 2(4):291--301, 1991. Transfer in neural networks 38
....BP catches up with LV as the errors reach their lower limit (fig. 2c) By this time the noise level in LV is zero and there is no difference between LV and BP. The third problem, n parity, has previously been used in benchmark studies where Conjugate Gradient algorithms greatly outperformed BP [21]. We study 4, 5 and 6 dimensional parity problems, with the architecture n inputs, 8 hidden units and one output. Using more hidden units than inputs decreases the risk of getting stuck in a local minima. Each network is trained for 10000 epochs or until 100 correct classification. The weights ....
Johansson, E., Dowla, F., and Goodman, D. 1992. Backpropagation Learning for Multilayer Feed-forward Neural Networks using the Conjugate Gradient Algorithm. Int. J. Neur. Syst. 2, 291-301.
....that convergence is slow [3] and that there are, in the usual implementation [16] two adjustable parameters, j and ff, that have to be manually optimized for the particular problem. Conjugate gradient methods have been used for many years [5] for minimizing functions, and have recently [8] been discovered by the neural network community. The usual methods require an expensive line search or its equivalent. M ller [13] has introduced a scaled conjugate gradient method; instead of a line search, an estimate of the second derivative along the search direction is used to find the ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman. Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. IEEE Transactions on Neural Networks, 1991.
....an error function (often a mean square error) of ANNs. The so called learning problem here is a typical optimisation problem in numerical analysis. Many improvements on the ANN learning algorithm are actually improvements over optimisation algorithms [3] such as conjugate gradient methods [4, 5]. Learning is different from optimisation because we want the learned system to have best generalisation, which is different from minimising an error function. An ANN with the minimum error does not necessarily means that the ANN has best generalisation unless there is an accurate way to measure ....
E. M. Johansson, F. U. Dowla, and D. M. Goodman. Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method. Int'l J. of Neural Systems, 2(4):291--301, 1991.
....4 we compare our implementation in Clean with implementations in the languages C and Haskell, while Section 5 concludes. 1 The Conjugate Gradient Algorithm An algorithm often used in scientific computing for solving a system of linear equations is the conjugate gradient algorithm [CSJ95] EO94] JDG92] HS93] The algorithm itself is an iterative approximation method to solve the linear equation Ax = b. A direct method is the gauss elimination. But, only when working with symbolic computation, which provides arbitrary precision integers, can we get an exact solution. Functional languages ....
E.M. Johansson, F.U. Dowla, and D.M. Goodman. Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method. International Journal of Neural Systems, 2(4), 1992.
No context found.
Johansson, ?., Dowla, F. and Goodman, ?. (1990). Backpropagation learning for multi-layer feed-forward neural networks using the conjugate gradient method. UCRL-JC-104850, Lawrence Livermore National Laboratory.
No context found.
F. D. E.M. Johansson, D. Goodman, Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method, Int. J. Neural Systems 2 (4) (1991) 291.
No context found.
E.M. Johansson, F.U. Dowla and D.M. Goodman, Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method, International Journal of Neural Systems, vol. 2, no. 4,
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC