See this document in CiteSeerX!

Fast Exact Multiplication by the Hessian (1994)  (Make Corrections)  (36 citations)
Barak A. Pearlmutter
Neural Computation



  Home/Search   Context   Related

Links:   ACM

 
View or download:
ogi.edu/pub/techreports...93002.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  unm.edu/~bap/publications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
allows Hessian-vector products to be computed as cheaply as gradients

Abstract: Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)|_{r=0},... (Update)

Context of citations to this paper:   More

.... [4 7] for stochastic gradient descent rely on fast curvature matrixvector products that can be obtained efficiently and automatically [7, 8]. Their calculation does not require explicit storage of the Hessian, which would be Proc. Intl. Conf. Artificial Neural Networks,...

...general than the traditional input hidden output layered architecture . This kind of formalism is attributed to Fernando Pineda by Pearlmutter [52]. 131 y 2 Figure A.1: Neuron in a feedforward network: y i = x i # i A.1.2 The # # Notation A special notation will be used to...

Cited by:   More
Online Independent Component Analysis with Local.. - Schraudolph.. (2000)   (Correct)
Iterative Scaled Trust-Region Learning in Krylov Subspaces.. - Mizutani, Demmel (2003)   (Correct)
On iterative Krylov-dogleg trust-region steps for solving neural .. - Mizutani (2000)   (Correct)

Similar documents (at the sentence level):
71.6%:   Fast Exact Multiplication by the Hessian - Pearlmutter (1994)   (Correct)

Active bibliography (related documents):   More   All
0.4:   Efficient Training of Feed-Forward Neural Networks - Møller (1997)   (Correct)
0.3:   Learning with First, Second, and No Derivatives: a.. - Roberto Battiti.. (1994)   (Correct)
0.3:   Investigation of the Use of Neural Networks for Computerised.. - Shane Dickson (1998)   (Correct)

Similar documents based on text:   More   All
0.1:   Y. Le Cun. A theoretical framework for.. - Modelssummerschool..   (Correct)
0.1:   Results of the Abbadingo One DFA Learning Competition.. - Lang, Pearlmutter, Price (1998)   (Correct)
0.1:   Sharable versus Non-Sharable Real-Time Resources - Wu, Kaiser (1993)   (Correct)

Related documents from co-citation:   More   All
10:   Second order derivatives for network pruning: Optimal brain surgeon - Hassibi, Stork - 1993
9:   Increased Rates of Convergence Through Learning Rate Adaption (context) - Jacobs - 1988
8:   Advances in Neural Information Processing Systems (context) - Touretzky, Mozer et al. - 1997

BibTeX entry:   (Update)

B. A. Pearlmutter. Fast Exact Multiplication by the Hessian. Neural Computation, 6(1):147 -- 160, January 1994. http://citeseer.ist.psu.edu/pearlmutter94fast.html   More

@article{ pearlmutter94fast,
    author = "Barak A. Pearlmutter",
    title = "Fast Exact Multiplication by the {H}essian",
    journal = "Neural Computation",
    volume = "6",
    number = "1",
    pages = "147--160",
    year = "1994",
    url = "citeseer.ist.psu.edu/pearlmutter94fast.html" }
Citations (may not include all citations):
1535   Cambridge University Press (context) - Press, Flannery et al. - 1988
376   A learning algorithm for Boltzmann Machines (context) - Ackley, Hinton et al. - 1985  ACM   DBLP
253   Beyond Regression: New Tools for Prediction and Analysis in .. (context) - Werbos - 1974
180   Connectionist learning procedures (context) - Hinton - 1987  ACM   DBLP
130   Second order derivatives for network pruning: Optimal brain .. - Hassibi, Stork - 1993  DBLP
95   The effective number of parameters: an analysis of generaliz.. (context) - Moody - 1992  DBLP
82   Generalization of back-propagation to recurrent neural netwo.. (context) - Pineda - 1987
61   A practical Bayesian framework for back-prop networks - MacKay - 1991
58   Stationary and nonstationary learning characteristics of the.. (context) - Widrow, McCool et al. - 1979
48   A learning rule for asynchronous perceptrons with feedback i.. (context) - Almeida - 1987  ACM
42   Learning algorithms for connectionist networks: Applied grad.. (context) - Watrous - 1987
30   Weight perturbation: An optimal architecture and learning te.. - Jabri, Flower - 1991
29   Exact calculation of the Hessian matrix for the multilayer p.. (context) - Bishop - 1992  ACM
27   A scaled conjugate gradient algorithm for fast supervised le.. (context) - Mller  ACM
17   A fast stochastic error-descent algorithm for supervised lea.. - Cauwenberghs - 1993  ACM   DBLP
12   Advances in Neural Information Processing Systems 5 (context) - Cowan, Giles - 1993
12   Summed weight neuron perturbation: An o (context) - Flower, Jabri - 1993
10   A parallel gradient descent method for learning in analog VL.. (context) - Alspector, Meir et al. - 1993  ACM   DBLP
10   The eigenvalues of mega-dimensional matrices (context) - Skilling - 1989
8   Advances in Neural Information Processing Systems (context) - Moody, Hanson et al. - 1992
6   Analog VLSI implementation of gradient descent (context) - Kirk, Kerns et al. - 1993  ACM   DBLP
6   Learning continuous probability distributions with the contr.. (context) - Movellan, McClelland - 1991
5   Calculating second derivatives on feedforward networks (context) - Buntine, Weigend - 1991
5   Gradient descent: Second-order momentum and saturating error - Pearlmutter - 1992
4   Supervised learning on large redundant training sets (context) - Mller
3   IEEE First International Conference on Neural Networks (context) - Caudill, Butler - 1987



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.unm.edu/~bap/publications.html):   More
Relating Egomotion and Image Evolution - Pearlmutter, Gurvits (1995)   (Correct)
VC Dimension of an Integrate-and-Fire Neuron Model - Zador (1996)   (Correct)
Automatic Learning Rate Maximization by On-Line.. - LeCun, Simard.. (1992)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC