allows Hessian-vector products to be computed as cheaply as gradients
Abstract: Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)|_{r=0},... (Update)
Context of citations to this paper: More
.... [4 7] for stochastic gradient descent rely on fast curvature matrixvector products that can be obtained efficiently and automatically [7, 8]. Their calculation does not require explicit storage of the Hessian, which would be Proc. Intl. Conf. Artificial Neural Networks,...
...general than the traditional input hidden output layered architecture . This kind of formalism is attributed to Fernando Pineda by Pearlmutter [52]. 131 y 2 Figure A.1: Neuron in a feedforward network: y i = x i # i A.1.2 The # # Notation A special notation will be used to...
Cited by: More
Online Independent Component Analysis with Local.. - Schraudolph.. (2000)
(Correct)
Iterative Scaled Trust-Region Learning in Krylov Subspaces.. - Mizutani, Demmel (2003)
(Correct)
On iterative Krylov-dogleg trust-region steps for solving neural .. - Mizutani (2000)
(Correct)
Similar documents (at the sentence level):
71.6%: Fast Exact Multiplication by the Hessian - Pearlmutter (1994)
(Correct)
Active bibliography (related documents): More All
0.4: Efficient Training of Feed-Forward Neural Networks - Møller (1997)
(Correct)
0.3: Learning with First, Second, and No Derivatives: a.. - Roberto Battiti.. (1994)
(Correct)
0.3: Investigation of the Use of Neural Networks for Computerised.. - Shane Dickson (1998)
(Correct)
Similar documents based on text: More All
0.1: Y. Le Cun. A theoretical framework for.. - Modelssummerschool..
(Correct)
0.1: Results of the Abbadingo One DFA Learning Competition.. - Lang, Pearlmutter, Price (1998)
(Correct)
0.1: Sharable versus Non-Sharable Real-Time Resources - Wu, Kaiser (1993)
(Correct)
Related documents from co-citation: More All
10: Second order derivatives for network pruning: Optimal brain surgeon
- Hassibi, Stork - 1993
9: Increased Rates of Convergence Through Learning Rate Adaption (context) - Jacobs - 1988
8: Advances in Neural Information Processing Systems (context) - Touretzky, Mozer et al. - 1997
BibTeX entry: (Update)
B. A. Pearlmutter. Fast Exact Multiplication by the Hessian. Neural Computation, 6(1):147 -- 160, January 1994. http://citeseer.ist.psu.edu/pearlmutter94fast.html More
@article{ pearlmutter94fast,
author = "Barak A. Pearlmutter",
title = "Fast Exact Multiplication by the {H}essian",
journal = "Neural Computation",
volume = "6",
number = "1",
pages = "147--160",
year = "1994",
url = "citeseer.ist.psu.edu/pearlmutter94fast.html" }
Citations (may not include all citations):
1535
Cambridge University Press (context) - Press, Flannery et al. - 1988
376
A learning algorithm for Boltzmann Machines (context) - Ackley, Hinton et al. - 1985 ACM DBLP
253
Beyond Regression: New Tools for Prediction and Analysis in .. (context) - Werbos - 1974
180
Connectionist learning procedures (context) - Hinton - 1987 ACM DBLP
130
Second order derivatives for network pruning: Optimal brain ..
- Hassibi, Stork - 1993 DBLP
95
The effective number of parameters: an analysis of generaliz.. (context) - Moody - 1992 DBLP
82
Generalization of back-propagation to recurrent neural netwo.. (context) - Pineda - 1987
61
A practical Bayesian framework for back-prop networks
- MacKay - 1991
58
Stationary and nonstationary learning characteristics of the.. (context) - Widrow, McCool et al. - 1979
48
A learning rule for asynchronous perceptrons with feedback i.. (context) - Almeida - 1987 ACM
42
Learning algorithms for connectionist networks: Applied grad.. (context) - Watrous - 1987
30
Weight perturbation: An optimal architecture and learning te..
- Jabri, Flower - 1991
29
Exact calculation of the Hessian matrix for the multilayer p.. (context) - Bishop - 1992 ACM
27
A scaled conjugate gradient algorithm for fast supervised le.. (context) - Mller ACM
17
A fast stochastic error-descent algorithm for supervised lea..
- Cauwenberghs - 1993 ACM DBLP
12
Advances in Neural Information Processing Systems 5 (context) - Cowan, Giles - 1993
12
Summed weight neuron perturbation: An o (context) - Flower, Jabri - 1993
10
A parallel gradient descent method for learning in analog VL.. (context) - Alspector, Meir et al. - 1993 ACM DBLP
10
The eigenvalues of mega-dimensional matrices (context) - Skilling - 1989
8
Advances in Neural Information Processing Systems (context) - Moody, Hanson et al. - 1992
6
Analog VLSI implementation of gradient descent (context) - Kirk, Kerns et al. - 1993 ACM DBLP
6
Learning continuous probability distributions with the contr.. (context) - Movellan, McClelland - 1991
5
Calculating second derivatives on feedforward networks (context) - Buntine, Weigend - 1991
5
Gradient descent: Second-order momentum and saturating error
- Pearlmutter - 1992
4
Supervised learning on large redundant training sets (context) - Mller
3
IEEE First International Conference on Neural Networks (context) - Caudill, Butler - 1987
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.unm.edu/~bap/publications.html): More
Relating Egomotion and Image Evolution - Pearlmutter, Gurvits (1995)
(Correct)
VC Dimension of an Integrate-and-Fire Neuron Model - Zador (1996)
(Correct)
Automatic Learning Rate Maximization by On-Line.. - LeCun, Simard.. (1992)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC