|
366
|
Beyond Regression: New Tools for Prediction and Analysis
– Werbos
- 1974
|
|
350
|
Optimal brain damage
– Cun, Denker, et al.
- 1990
|
|
295
|
A learning algorithm for Boltzmann Machines
– Ackley, Hinton, et al.
- 1985
|
|
273
|
Connectionist learning procedures
– Hinton
- 1989
|
|
148
|
order derivatives for network pruning: optimal brain surgeon
– Hassibi, Stork, et al.
- 1993
|
|
145
|
The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems
– Moody
- 1992
|
|
106
|
Generalization of back-propagation to recurrent neural networks
– Pineda
- 1987
|
|
73
|
Improving the convergence of back-propagation learning with second order methods
– Becker, LeCun
- 1989
|
|
62
|
A learning rule for asynchronous perceptrons with feedback in a combinatorial environment
– Almeida
- 1987
|
|
51
|
Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization
– Watrous
- 1987
|
|
41
|
Weight Perturbation: An Optimal Architecture and learning Technique for Analog VLSI Feedforward and recurrent Multilayer Networks
– Jabri, Flower
- 1992
|
|
22
|
Second order properties of error surfaces: Learning time and generalization
– LeCun, Kanter, et al.
- 1991
|
|
20
|
A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization
– Cauwenberghs
- 1993
|
|
20
|
in press), Learning continuous probability distributions with symmetric diffusion, networks
– Cambridge, Movellan, et al.
- 1991
|
|
19
|
Summed Weight Neuron Perturbation: An O(N) Improvement over Weight Perturbation
– Flower, Jabri
- 1993
|
|
12
|
AParallel Gradient Descent Method for Learning in Analog VLSI Neural Networks
– Alspector, Meir, et al.
- 1993
|
|
6
|
Analog VLSI Implementation of Gradient Descent
– Kirk, Kerns, et al.
- 1993
|
|
6
|
Automatic learning rate maximization in large adaptive machines
– LeCun, Simard, et al.
- 1993
|
|
5
|
Gradient descent: Second-order momentum and saturating error
– Pearlmutter
- 1992
|