4 citations found. Retrieving documents...
Y. LeCun, Mod`eles connexionnistes de l'apprentissage, Ph.D. thesis, Universit'e Paris 6: Paris, 1987.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Behaviour in 0 of the Neural Networks Training Cost - Goutte (1997)   (Correct)

....training, numerical optimisation, regularisation 1. Introduction Most algorithms used for training neural networks models minimise a cost function with multi dimensional optimisation algorithms using derivatives information. Common examples are steepest descent, stochastic gradient descent [6, 4], conjugate gradient [5] or second order methods [1] We will focus our attention on the behaviour of the cost function in 0. In that purpose, we derive in sections 3 to 5 the expressions of the derivatives of arbitrary order. These rather straightforward expressions show that a large number of ....

Y. LeCun, Mod`eles connexionnistes de l'apprentissage, Ph.D. thesis, Universit'e Paris 6: Paris, 1987.


Forward models: Supervised learning with a distal teacher - Jordan, Rumelhart (1992)   (108 citations)  (Correct)

....structure (see Figure 9) In the current section we consider a more specialized problem formulation in which the focus is on particular classes of target trajectories. This formulation is based on variational calculus and is closely allied with methods in optimal control theory (Kirk, 1970; LeCun, 1987). The algorithm that results is a form of backpropagation in time (Rumelhart, Hinton, Williams, 1986) in a recurrent network that incorporates a learned forward model. The algorithm differs from the algorithm presented above in that it not only inverts the relationship between actions and ....

.... stochastic gradient of J , that is, the gradient evaluated along a particular sample trajectory y ff : J ff = 1 2 N ff X n=1 (y ff [n] Gamma y ff [n] T (y ff [n] Gamma y ff [n] 16) The gradient of this cost functional can be obtained using the calculus of variations (see also LeCun, 1987, Narendra Parthasarathy, 1990) Letting Phi[n] represent the vector of partial derivatives of J ff with respect to x ff [n] and letting Psi[n] represent the vector of partial derivatives of J ff with respect to u ff [n] Appendix A shows that the gradient of J ff is given by the following ....

LeCun, Y. (1987). Mod`eles connexionnistes de l'apprentissage. Unpublished doctoral dissertation, Universit'e de Paris, VI.


Gradient-Based Learning Applied to Document Recognition - LeCun, Bottou, Bengio.. (1998)   (63 citations)  Self-citation (Lecun)   (Correct)

.... learning did not use gradients, but virtual targets for units in intermediate layers [LeCun, 1985, LeCun, 1986] or minimal disturbance arguments [Parker, 1985] The Lagrange formalism used in the control theory literature provides perhaps the best rigorous method for deriving back propagation [LeCun, 1987], and for deriving generalizations of back propagation to recurrent networks [LeCun, 1988] and networks of heterogeneous modules [Bottou and Gallinari, 1991] A simple derivation for generic multi layer systems is given in Section 1.5. The fact that local minima do not seem to be a problem for ....

....because their outputs are not observable from the outside. more complex situations than the simple cascade of modules described above, the partial derivative notation becomes somewhat ambiguous and awkward. A completely rigorous derivation in more general cases can be done using Lagrange functions [LeCun, 1987, LeCun, 1988, Bottou and Gallinari, 1991] Traditional multi layer neural networks are a special case of the above where the state information Xn is represented with fixed sized vectors, and where the modules are alternated layers of matrix multiplications (the weights) and component wise ....

[Article contains additional citation context not shown here]

LeCun, Y. (1987). Mod`eles connexionnistes de l'apprentissage (connectionist learning models). PhD thesis, Universit'e P. et M. Curie (Paris 6).


Gradient-Based Learning Applied to Document Recognition - Lecun, Bottou, Bengio.. (1998)   (63 citations)  Self-citation (Lecun)   (Correct)

.... context of neural network learning did not use gradients, but virtual targets for units in intermediate layers [17] 18] or minimal disturbance arguments [19] The Lagrange formalism used in the control theory literature provides perhaps the best rigorous method for deriving back propagation [20], and for deriving generalizations of back propagation to recurrent networks [21] and networks of heterogeneous modules [22] A simple derivation for generic multi layer systems is given in Section i.e. The fact that local minima do not seem to be a problem for multi layer neural networks is ....

....because their outputs are not observable from the outside. more complex situations than the simple cascade of modules described above, the partial derivative notation becomes somewhat ambiguous and awkward. A completely rigorous derivation in more general cases can be done using Lagrange functions [20], 21] 22] Traditional multi layer neural networks are a special case of the above where the state information Xn is represented with fixed sized vectors, and where the modules are alternated layers of matrix multiplications (the weights) and component wise sigmoid functions (the neurons) ....

[Article contains additional citation context not shown here]

Y. LeCun, Mod`eles connexionnistes de l'apprentissage (connectionist learning models), Ph.D. thesis, Universit'e P. et M. Curie (Paris 6), June 1987.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC