6 citations found. Retrieving documents...
Y. Le Cun, P. Y. Simard, and B. A. Pearlmutter, "Automatic learning rate maximization by on-line estimation of the Hessian's eigenvectors," in Advances in Neural Information Processing Systems 5, S. J. Hanson, J. D. Cowan, and C. L. Giles, Eds. San Mateo, CA: Morgan Kaufmann, 1993, pp. 156--163.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Deterministic Nonmonotone Strategies for Effective.. - Plagianakos..   (Correct)

....initial vector . It also requires that is chosen to satisfy the relation , where denotes the Hessian of at , in some bounded region, where the relation holds. However, the manipulation of the full Hessian is too expensive in computation and storage for MLPs with several hundred weights [7] In [36], a technique based on appropriate perturbations of the weights has been proposed for the online estimation of the extreme eigenvalues and eigenvectors of the Hessian without calculating the full matrix . According to experiments reported in [36] the largest eigenvalue of the Hessian is mainly ....

....for MLPs with several hundred weights [7] In [36] a technique based on appropriate perturbations of the weights has been proposed for the online estimation of the extreme eigenvalues and eigenvectors of the Hessian without calculating the full matrix . According to experiments reported in [36], the largest eigenvalue of the Hessian is mainly determined by the MLP architecture, the initial weights and by short term low order statistics of the training data. This technique can be used to determine requiring additional presentations of the training set in the early training. Cauchy s ....

Y. Le Cun, P. Y. Simard, and B. A. Pearlmutter, "Automatic learning rate maximization by on-line estimation of the Hessian's eigenvectors," in Advances in Neural Information Processing Systems 5, S. J. Hanson, J. D. Cowan, and C. L. Giles, Eds. San Mateo, CA: Morgan Kaufmann, 1993, pp. 156--163.


Speaker Normalization Improvement by Neural.. - Autiero.. (1999)   (Correct)

....recognition rate does not increase in the same way. Furthermore, more than 200 epochs are required to achieve acceptable performances. 5.2 Parameter optimization results 5.2. 1 Learning rate and momentum term Many procedures exist for setting gradient descendent plus momentum automatically [19] [20], 21] However, given the large variance, the choice has been made more theoretically than experimentally. For sufficiently small value of the learning rate, the average direction of motion in weight space approximates the negative of the local gradient [10] So the learning rate has been fixed ....

Y. LeCun, P. Y. Simard, and B. Pearlmutter, "Automatic Learning Rate Maximization by On-line Estimation of the Hessian Eigenvectors", in Advances in Neural Information Processing Systems, Vol. 5, pp. 156163, San Mateo, CA: Morgan Kaufmann, 1993.


Nonmonotone Methods for Backpropagation Training with.. - Plagianako, Vrahatis   (Correct)

....chosen to satisfy ther elation sup #H(w)## # 1 # in some bounded rdedE wher therE2# tion E(w) # E(w 0 ) holds, wher H(w) denotes the Hessian of E at w. However the manipulation of the full Hessian is too expensive in computation and stor8P for FNNs with sever[ hundrP weights [4] Le Cun [14] pr4 osed a technique, based on appr3[Eq8[ per20O[# tions of the weights, for estimating on line theextr23 eigenvalues and eigenvector of the Hessian without calculating the fullmatr[ H. Accor6[E to exper208 ts rE or3[ in [14] thelar8#0 eigenvalue of the Hessian is mainlydeter622P by the FNN ar ....

....computation and stor8P for FNNs with sever[ hundrP weights [4] Le Cun [14] pr4 osed a technique, based on appr3[Eq8[ per20O[# tions of the weights, for estimating on line theextr23 eigenvalues and eigenvector of the Hessian without calculating the fullmatr[ H. Accor6[E to exper208 ts rE or3[ in [14] thelar8#0 eigenvalue of the Hessian is mainlydeter622P by the FNN ar hitecturq the initial weights and by shor3]OEq low or8[ statistics of the trE2OP[ data. This technique could be used todeter mine # r62[ Eq2 additionalprtio tations of thetr2#2 ing set in the ear[ tr[3O#Eq An alterEq[6 e ....

Y. Le Cun, P.Y. Simard, and B.A. Pearlmutter, Automatic learning rate maximization by on--line estimation of the Hessian's eigenvectors, in Advances in Neural Information Processing Systems 5, S.J. Hanson, J.D. Cowan, and C.L. Giles (eds.), 156--163, Morgan Kaufmann, San Mateo, CA, (1993).


A Framework for the Development of Globally.. - Magoulas..   (Correct)

....# # 0 . 2. Set # k = #, where # is such that # # # # (2# 0 #) and go to the next step. 3. Update the weights w k 1 = w k # k #E(w k ) However, the manipulation of the full Hessian is too expensive in computation and storage for FNNs with several hundred weights [3] Le Cun [11] proposed a technique, based on appropriate perturbations of the weights, for estimating on line the principle eigenvalues and eigenvectors of the Hessian without calculating the full matrix H. According to experiments reported in [11] the largest eigenvalue of the Hessian is mainly determined ....

....storage for FNNs with several hundred weights [3] Le Cun [11] proposed a technique, based on appropriate perturbations of the weights, for estimating on line the principle eigenvalues and eigenvectors of the Hessian without calculating the full matrix H. According to experiments reported in [11] the largest eigenvalue of the Hessian is mainly determined by the FNN architecture, the initial weights and by short term low order statistics of the training data. This technique could be used to determine # 0 , in Step 1 of the above algorithm, requiring additional presentations of the ....

Y. Le Cun, P.Y. Simard and B.A. Pearlmutter, Automatic learning rate maximization by on--line estimation of the Hessian's eigenvectors, in Advances in Neural Information Processing Systems 5, S.J. Hanson, J.D. Cowan, and C.L. Giles (eds.), pp. 156--163, Morgan Kaufmann, San Mateo, CA, 1993.


Using Curvature Information for Fast Stochastic Search - Orr, Leen (1997)   (5 citations)  (Correct)

....this calculation. Thus, to compute the entire weight update requires two forward backward propagations, one for the gradient calculation and one for computing H t Delta t . The only constraint on 0 is that 0 1=max . We use the on line algorithm developed by LeCun, Simard, and Pearlmutter [13] to find the largest eigenvalue prior to the start of training. 3 Examples In the following two subsections we examine the behavior of annealed learning with adaptive momentum on networks previously trained to a point close to an optimum, where the noise dominates. We look at very simple linear ....

Yann LeCun, Patrice Y. Simard, and Barak Pearlmutter. Automatic learning rate maximization by on-line estimation of the hessian's eigenvectors. In Giles, Hanson, and Cowan, editors, Advances in Neural Information Processing Systems, vol. 5, San Mateo, CA, 1993. Morgan Kaufmann.


Efficient BackProp - LeCun, Bottou, Orr, Müller (1998)   (7 citations)  Self-citation (Lecun)   (Correct)

....This method can be applied to compute the principal eigenvector and eigenvalue of H by the power method. By iterating and setting Psi (t 1) H Psi (t) k Psi (t)k ; 60) the vector Psi (t) will converge to the largest eigenvector of H and k Psi (t)k to the corresponding eigenvalue [23, 14, 10]. See also [33] for an even more accurate method that (1) does not use finite differences and (2) has similar complexity. 8 Analysis of the Hessian in multi layer networks It is interesting to understand how some of the tricks shown previously influence on the Hessian, i.e. how does the Hessian ....

....and details of the implementation. Typically, the eigenvalue distribution of the Hessian looks like the one sketched in Figure 20: a few small eigenvalues, many medium ones and few very large ones. We will now argue that the large eigenvalues will cause the trouble in the training process because [23, 22] non zero mean inputs or neuron states [22] wide variations of the second derivatives from layer to layer correlation between state variables. To exemplify this, we show the eigenvalue distribution of a network trained on OCR data in Figure 20. Clearly, there is a wide spread of ....

Y. LeCun, P. Y. Simard, and B. Pearlmutter. Automatic learning rate maximization by on-line estimation of the hessian's eigenvectors. In Giles, Hanson, and Cowan, editors, Advances in Neural Information Processing Systems, vol. 5, San Mateo, CA, 1993. Morgan Kaufmann.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC