97 citations found. Retrieving documents...
R. Battiti. First- and second--order methods for learning: between steepest descent and newton's method. Neural Computation, 4(2):141-- 166, 1992.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

An Automatic Microcalcification Detection System Based .. - Papadopoulos.. (2002)   (Correct)

....variance. Several training algorithms were implemented and tested: gradient descent methods, resilient backpropagation, conjugate gradient methods, and quasi Newton methods [5] The best results are obtained using a quasi Newton method, and more specifically, the onestep secant (OSS) algorithm [2]. To assess the performance of several architectures and training algorithms the two fold cross validation method was employed. According to this procedure, the dataset is randomly divided into two subsets where the number of positive and negative cases in each subset is approximately equal. In a ....

Battiti R. First and second order methods for learning: between steepest descent and Newton's method. Neural Comput 1992;4(2):141--66.


A New Class of Quasi-Newtonian Methods for Optimal .. - Bortoletti, Di.. (2003)   (Correct)

....based on the iterative equation (1) where , and the matrix is a rank 2 perturbation of the previous positive definite (pd) Hessian approximation . As a consequence, the BFGS search direction and the matrix are easily computable (by means of the Sherman Morrison formula) once is known. In [4] and [24] two heuristic variants of QN BFGS algorithms, named OSS and OSS , respectively, were introduced to reduce the number of operations per step from to . The latter methods were essentially memoryless modifications of the classical iterative procedure to approximate the Hessian matrix or ....

....complexity per step to a small number of fast transforms diagonalizing the matrices of .In this way one obtains flops instead of the of BFGS. Also the space complexity is reduced, from to . Notice that the limited memory BFGS method BFGS) 1] 27] 30] 31] and the memory less OSS OSS methods [4], 24] 37] turn out to be particular cases of the algorithm (6) The main idea in BFGS method is to use second order BFGS information from the most recent iterations. In BFGS the matrix depends on a limited number of pairs , More precisely, we have (7) where , and with , if and , ....

[Article contains additional citation context not shown here]

R. Battiti, "First- and second-order methods for learning: Between steepest descent and Newton's method," Neural Comput., vol. 4, pp. 141--166, 1992.


New Second-Order Algorithms for Recurrent Neural.. - Campolucci.. (1998)   (Correct)

....convergence rate and so are not suited for some DSP problems, especially for on line computations. Second order algorithms have instead better performances because they also use the second order information stored in the Hessian matrix. There are several examples of this algorithms in literature [3,4], but a sub class of them, based on the conjugate gradient method, has shown good properties in terms of rate of convergence and computational complexity. Conjugate directions methods are based on choosing the search direction and the step size of a minimization formula by using second order ....

Battiti R., "First- and Second-Order Methods for Learning: Between Steepest Descend and Newton's Method", Neural Computation, 4, 1992.


Training Neural Nets with the Reactive Tabu Search - Battiff, Tecchiolli   (Correct)

.... 23370.0 (77689.8) 20.0 32 89302.7 (158133.0) Available techniques to increase the afety requirements and the speed of convergence of BP, are, for exam ple, the use of adaptive learning rates for on line BP [30] the use of line searches and second order information for batch BP, see [7] [3] and the references contained. B, Event discrimination in High Encr9y Physics Experimental HEP facilities need state of the art discrimination systems for selecting and classifying the relevant events. In a typical facility, colliding particles produce streams of secondary particles called ....

....different algorithms. The same benchmark task has been used in [7] for comparing different training algorithms: i) the backpropagation algorithm [39] ii) a version of gradient descent with adaptive step, iii) the conjugate gradient technique, iv) the On Slcp Scan method with fast line searches [3], v) two versions of the stochastic search technique of [41] 15] in particular the new proposal called affin shaker) The patterns used for training and testing the neural classifiers have been produced with the COJETS event generator [36] using the natural frequencies. A total of 100,000 ....

R. Battiti, "First- and Second-Order Methods for Learning: Be- tween Steepest Descent and Newton's Method," Neural Computation, vol. 4, no. 2, pp. 141 166, 1992.


The ATR HIP Laboratories Minimum Error.. - Biem, Katagiri.. (2001)   (Correct)

....training is stopped after a fixed number of iterations. 6.3. 8 Modified Newton s methods: the Quickprop algorithm In addition to optimizing the MCE loss function with the GPD method, conjugate gradient methods and second order techniques such as modified Newton s methods can also be used [Battiti, 1992]. Many of these methods can be used in batch mode, where the classifier parameters are only updated after a presentation of the entire training set. This makes it possible to parallelize the optimization phase by distributing training data over several machines. This is not so easy to implement ....

....quadratic model of F ( is correct, and the Hessian matrix F ( is positive definite, the minimum will be reached in one iteration. In general, if the Hessian matrix is positive definite, and the initial value of # is su#ciently close to the optimum, Newton s method converges q quadratically [Battiti, 1992]. The first di#culty in using Newton s method for practical optimization problems is that calculating the Hessian matrix (whose size goes up as the square of the size of the parameter vector #) may be prohibitive. Secondly, if the Hessian is not positive definite, there will be directions of ....

[Article contains additional citation context not shown here]

Battiti, R.. (1992). First- and Second- Order Methods for Learning: Between Steepest Descent and Newton's Method. Neural Computation, Vol. 4, pp. 141166.


Discriminative Prototype-Based Methods For Speech Recognition - Mcdermott   (Correct)

....cycling over the training data, while diminishing the step size appropriately (In practice, training can be stopped after a fixed number of iterations. Conjugate gra dient methods, as well as second order techniques such as modified Newton s methods can also be used. The reader is referred to [13] for a good review of different optimization methods. Many of these methods can be used in batch mode, where the classifier parameters are only updated after a pre sentation of the entire training set. This makes it possible to parallelize the optimization phase by distributing training data ....

....quadratic model of F 0 is correct, and the Hessian matrix VF0 is positive definite, the 31 minimum will be reached in one iteration. In general, if the Hessian matrix is positive definite, and the initial value of A is suiticiently close to the optimum, Newton s method converges q quadratically [13]. The first diiticulty in using Newton s method for practical optimiza tion problems is that calculating the Hessian matrix (whose size goes up as the square of the size of the parameter vector A) may be prohibitive. Secondly, if the Hessian is not positive definite, there will be directions of ....

[Article contains additional citation context not shown here]

R. Battiti. First- and Second- Order Methods for Learning: Between Steepest Descent and Newton's Method. Neural Computation, 4:141- 166, 1992.


Connectionist Models for Intelligent Reactive Power Control - Ajith Abraham Baikunth (2000)   (Correct)

....Unfommately, it is computational expensive to derive the Hessian matrix for feedforward ANN. In a Quasi Newton method (or secant) an approximate Hessian matrix is updated at each iteration of the algorithm. The update is computed as a function of the gradient [17] The One Step Secant (OSS) [18] method is an attempt to bridge the gap between the computational complexity of conjugate gradient algorithms and the storage and computation in each iteration requirement in the Quasi Newton algorithm. This algorithm does not store the complete Hessian matrix, it assumes that at each iteration ....

Battiti R, First and Second Order Methods for Learning: Between steepest descent and Newton's Method, Neural Computation, Vol. 4, No 2, pp. 141-166, 1992.


Discriminative Training for Speech Recognition - McDermott (1997)   (1 citation)  (Correct)

....presentation of all training samples, according to: t 1 = t Gamma ffl t rL 1 ( t ) 2.41) which corresponds to what is often referred to as the batch method of optimization. The Steepest Descent approach is to determine ffl t through a line search along the direction determined by rL 1 ( t ) [Battiti, 1992]. Many other gradient based methods exist, such as conjugate gradient methods and Modified Newton s methods. In most of the experiments described in the following, the update method used was the sequential update rule. This kind of sequential, stochastic method has been found by many to be ....

....quadratic model of F ( is correct, and the Hessian matrix F ( is positive definite, the minimum will be reached in one iteration. In general, if the Hessian matrix is positive definite, and the initial value of is sufficiently close to the optimum, Newton s method converges q quadratically [Battiti, 1992]. The first difficulty in using Newton s method for practical optimization problems is that calculating the Hessian matrix (whose size goes up as the square of the size of the parameter vector ) may be prohibitive. Secondly, if the Hessian is not positive definite, there will be directions of ....

[Article contains additional citation context not shown here]

Battiti, R.. (1992). First- and Second- Order Methods for Learning: Between Steepest Descent and Newton's Method. Neural Computation, Vol. 4, pp. 141-166.


Declaration - This Dissertation Is   (Correct)

....an assort ment of expected values. These are immediately available if Px(O(u) was calculated using the Baum Welch algorithm. HMMs are therefore unusual in that once the probability has been found, the derivatives can be calculated with very little extra work. 4. 3 Steepest Ascent This algorithm [4, 5], uses the following update Xk = Xk c 0X x (4.30) In effect we have replaced the hessian term in the heuristic equation 4.4 with the identity matrix. The learning rate c, can either be fixed by the user or more expensively found by a line search (see 4.10) For small c the method traces out ....

....we duplicate the training data 10 times, the work required by equation 4.30 increases by the same factor. Clearly we would do better by updating 10 times more often. This process taken to the extreme, when we update every training pattern, leads to an algorithm known as stochastic approximation, [2, 5, 77]. The update equation now becomes OlogP . 4.33) where a is the gradient on the n th randomly chosen training example. It can be shown that the algorithm converges to a stationary point with probability one, if an satisfies, lim an = 0 (4.34) c. 4.35) n 0 (4.36) OZ n O ....

[Article contains additional citation context not shown here]

Roberto Battiti. First- and second-order methods for learning: Between steepest descent and newton's method. Neural Computation, 4:141-166, 1992.


Sensitivity Analysis for Selective Learning by Feedforward.. - Engelbrecht (2001)   (1 citation)  (Correct)

.... Current research mostly concentrates on the optimal setting of initial weights [2, 3] optimal learning rates and momentum [4, 5, 6, 7] finding optimal NN architectures using pruning techniques [8, 9, 10, 11, 12, 13] and construction techniques [14, 15, 16] sophisticated optimization techniques [17, 18, 19, 20, 21, 22], and adaptive activation functions [23, 24, 25] This paper presents an alternative approach to improve generalization and training time, i.e. active learning using sensitivity analysis. Standard error back propagating NNs are passive learners. These networks passively receive information about ....

Battiti, R.: First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method, Neural Computation, 4, 1992, 141-166


Using the Taylor Expansion of Multilayer Feedforward Neural.. - Engelbrecht   (Correct)

.... q)Dq i 1 2 E 00 ( q)Dq 2 i (4) The first order term E ( q) is used in gradient descent optimization to drive the NN to a local minimum [26] In this case q i represents a weight of the NN. The second order term has also been used in optimization to improve convergence [3, 4]. Objective function sensitivity analysis has been used widely in pruning of NN parameters. Optimal Brain Damage (OBD) 19, 24] and Optimal Brain Surgeon (OBS) 21, 22] prune weights with low saliency, while Optimal Cell Damage (OCD) 6] prunes irrelevant input and hidden units. OBD, OBS and OCD ....

R Battiti, First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method, Neural Computation, Vol 4, 1992, pp 141166.


A Theoretical Framework for Local Adaptive Networks - Weaver (1999)   (2 citations)  (Correct)

....increase quickly as the size of the network increases. Because of the large number of weights, standard neural network algorithms based on second order models are usually designed to avoid computation of the Hessian. For example, variants of Newton s method do not require knowledge of the Hessian [7]. Similarly, in the conjugate gradient method, which computes a conjugate direction and then performs a line search, the conjugate direction can be computed much faster (because of its algebraic form) than the computation of the Hessian matrix [40] Other methods called quasi second order methods ....

R. Battiti. First- and second-order methods for learning: Between steepest descent and newton's method. Neural Computation, 4:141--166, 1992.


Neural Network Models for Intelligent Networks: Deriving .. - Battiti, Villani, Le.. (2002)   Self-citation (Battiti)   (Correct)

.... function to a desired accuracy, provided that the number of hidden neurons is sufficiently large [11] In this work we consider a single hidden layer MLP and a training technique that uses second derivatives information: the one stepsecant method with fast line searches OSS introduced in [3, 4]. The one step secant method OSS is a variation of what is called one step (memory less) Broyden Fletcher Goldfarb Shanno method, see [17] The OSS method is described in detail and is used for multi layer perceptrons in [3] and [4] 3 System and experimental setup Our system consists of a ....

.... method with fast line searches OSS introduced in [3, 4] The one step secant method OSS is a variation of what is called one step (memory less) Broyden Fletcher Goldfarb Shanno method, see [17] The OSS method is described in detail and is used for multi layer perceptrons in [3] and [4]. 3 System and experimental setup Our system consists of a wireless Local Area Network based on the IEEE 802.11b standard. It is located on the first floor of a 3 storeyed building. The floor has dimensions of 25.5 x 24.5 , for a total area of 624.75 and includes more than eleven rooms ....

R. Battiti. First-and second-order methods for learning: Between steepest descent and newton 's method. Neural Computation, 4:141--166, 1992.


Statistical Learning Theory for Location Fingerprinting in.. - Battiti, al. (2002)   (1 citation)  Self-citation (Battiti)   (Correct)

....can be used for the output layer; for example, the identity function can be used for unlimited output, while the sigmoidal function is more suitable for yes no classification problems. The training technique adopted in this paper is the one stepsecant (OSS) method with fast line searches [20] using secondorder derivative information. Usage of neural networks in conjunction with localization problem was proposed in [4] 1) Learning phase complexity: The time complexity of the learning phase is usually high; for example, in our case it takes a few minutes, although it varies greatly ....

R. Battiti, "First-and second-order methods for learning: Between steepest descent and newton's method," Neural Computation, vol. 4, pp. 141-- 166, 1992.


Location-Aware Computing: A Neural Network Model For.. - Battiti, Le Nhat.. (2002)   (1 citation)  Self-citation (Battiti)   (Correct)

....generalize in an appropriate manner when confronted with new data, not present in the training set. 3. 1 The One Step Secant method for training neural networks Efficient optimization algorithms are crucial in the learning phase of models like neural networks and have been studied for example in [5], 6] Let us briefly define the notation. We consider the standard multi layer perceptron (MLP) architecture, with weights connecting only nearby layers and the sum of squared differences energy function defined as: 0 , 0 (1 32 ( 4 (2) where ....

.... any continuous function to a desired accuracy, provided that the number of hidden neurons is sufficiently large [13] In this work we consider a single hidden layer MLP and a training technique that uses second derivatives information: the one stepsecant method with fast line searches OSS, see [5], 4] The standard back propagation technique uses only first order information (the gradient) In particular, the stochastic on line back propagation update is given by: CBED 0 CB 3FHG IB (3) where the pattern is chosen randomly from the training set at each iteration, ....

[Article contains additional citation context not shown here]

R. Battiti. First-and second-order methods for learning: Between steepest descent and newton 's method. Neural Computation, 4:141--166, 1992.


Learning with First, Second, and No Derivatives: a.. - Roberto Battiti.. (1994)   (2 citations)  Self-citation (Battiti)   (Correct)

....the neural net community, their utility being recognized in particular for problems with a limited number of weights ( 100) and requiring high precision in the output values. A partial bibliography and a description of the relationships between different second order techniques was presented in [3]. In spite of their utility, the above techniques are not completely satisfactory. In particular their realization with analog VLSI hardware is problematic mainly because of the high precision required. In addition, many tasks are characterized by high degrees of hill conditioning and they are ....

....on function values (obtained with a smart sampling of the neighborhood) applied to a significant discrimination task in the area of High Energy Physics. The paper is therefore divided into two parts: the first discuss training and test results obtained with a selection of the methods reviewed in [3], the second presents the new optimization algorithm and the results obtained on the same task. The benchmark task is illustrated in Section 2. The methods based on backpropagation with modifications and the results obtained are described in Section 3, while the algorithms based on blind ....

[Article contains additional citation context not shown here]

R. Battiti, First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method, Neural Computation 4 (2) (1992) 141--166.


Constructive Function Approximation: Theory and Practice - Docampo, Hush, Abdallah (1997)   (Correct)

No context found.

R. Battiti. First- and second--order methods for learning: between steepest descent and newton's method. Neural Computation, 4(2):141-- 166, 1992.


Active Object Detection - De Croon Postma   (Correct)

No context found.

R. Battiti. First and second order methods for learning: Between steepest descent and newton's method. Neural Computation, 4(2):141--166, 1992.


Learning Weights for Linear Combination of Forecasting Methods - Prudencio, Ludermir (2006)   (Correct)

No context found.

R. Battiti. First and second-order methods for learning between steepest descentand newton's method. Neural Computation, 4:141--166, 1992.


Journal of Machine Learning Research 7 (2006) 1159--1182.. - Based On Sensitivity   (Correct)

No context found.

R. Battiti. First and second order methods for learning: Between steepest descent and Newton's method. Neural Computation, 4(2):141--166, 1992.


An Information-Maximisation Approachto Blind Separation and.. - Anthony Bell And (1995)   (149 citations)  (Correct)

No context found.

Battiti R. 1992. First- and second-order methods for learning: between steepest descent and Newton's method, Neural Computation,4,2,141-166


An Incremental Learning Algorithm That Optimizes Network Size and.. - Zhang (1994)   (9 citations)  (Correct)

No context found.

R. Battiti, "First- and second-order methods for learning between steepest descent and Newton's method," Neural Computation, vol. 4, pp. 141--166, 1992.


Nonextensive Entropy and Regularization for Adaptive Learning - Anastasiadis, Magoulas   (Correct)

No context found.

R. Battiti, First-- and second--order methods for learning: between steepest descent and Newton's method. Neural Computation, 4, 141-- 166, 1992.


Constructive Function Approximation: Theory and Practice - Docampo, Hush, Abdallah (1997)   (Correct)

No context found.

R. Battiti. First- and second--order methods for learning: between steepest descent and newton's method. Neural Computation, 4(2):141-- 166, 1992.


An Incremental Learning Algorithm That Optimizes Network Size and.. - Zhang (1994)   (9 citations)  (Correct)

No context found.

R. Battiti, "First- and second-order methods for learning between steepest descent and Newton's method," Neural Computation, vol. 4, pp. 141--166, 1992.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC