Natural Gradient
, 2013
"... In derivativefree optimization one aims at minimizing an unknown objective function. The only information accessible are algorithmselected function measurements. Evolution Strategies (ES) are among the state of the art heuristics for this optimization problem. ES typically use parametrized probabi ..."
Abstract
probability distributions to generate correlated samples in promising regions. Recently, it was shown that applying gradient descent in the parameter space of the search distribution leads to algorithms that are very similar to the most successful ES. The development of those socalled Natural Evolution
Rprop Using the Natural Gradient
, 2005
"... Gradientbased optimization algorithms are the standard methods for adapting the weights of neural networks. The natural gradient gives the steepest descent direction based on a nonEuclidean, from a theoretical point of view more appropriate metric in the weight space. While the natural gradient ..."
Abstract

Cited by 8 (1 self)
Gradientbased optimization algorithms are the standard methods for adapting the weights of neural networks. The natural gradient gives the steepest descent direction based on a nonEuclidean, from a theoretical point of view more appropriate metric in the weight space. While the natural
WHY NATURAL GRADIENT?
"... Gradient adaptation is a useful technique for acljusting a set of parameters to minimize a cost fun(:tion. While often easy to implement, the conveygence speed of gradient adaptation can be slow when the slope of the cost function varies widely for small changes in the parameters. In this papel., we ..."
Abstract
., we outline an alternative technique, termed natural gradient adaptation, that overcomes the poor convergence properties of gradient adaptation in many cases. The natural gradient is based on differential geometry and employs knowledge of the Riemannian structure of the parameter space to adjust
Natural Gradients for Deformable Registration
"... We apply the concept of natural gradients to deformable registration. The motivation stems from the lack of physical interpretation for gradients of imagebased difference measures. The main idea is to endow the space of deformations with a distance metric which reflects the variation of the difference"
Abstract

Cited by 4 (1 self)
We apply the concept of natural gradients to deformable registration. The motivation stems from the lack of physical interpretation for gradients of imagebased difference measures. The main idea is to endow the space of deformations with a distance metric which reflects the variation
Multichannel Blind Deconvolution and Equalization Using the Natural Gradient
 In The First Signal Processing Workshop on Signal Processing Advances in Wireless Communications
, 1997
"... Multichannel deconvolution and equalization is an important task for numerous applications in communications, signal processing, and control. In this paper, we extend the efficient natural gradient search method in [1] to derive a set of online algorithms for combined multichannel blind source separation"
Abstract

Cited by 120 (24 self)
Multichannel deconvolution and equalization is an important task for numerous applications in communications, signal processing, and control. In this paper, we extend the efficient natural gradient search method in [1] to derive a set of online algorithms for combined multichannel blind source
Natural Gradient Matrix Momentum
 in Proceedings of the Ninth International Conference on Neural Networks, The Institution of Electrical Engineers
, 1999
"... Natural gradient learning is an efficient and principled method for improving online learning. In practical applications there will be an increased cost required in estimating and inverting the Fisher information matrix. We propose to use the matrix momentum algorithm in order to carry out efficient"
Abstract

Cited by 1 (0 self)
Natural gradient learning is an efficient and principled method for improving online learning. In practical applications there will be an increased cost required in estimating and inverting the Fisher information matrix. We propose to use the matrix momentum algorithm in order to carry out
Topmoumoute online natural gradient algorithm
 Advances in Neural Information Processing Systems 20
, 2008
"... Guided by the goal of obtaining an optimization algorithm that is both fast and yielding good generalization, we study the descent direction maximizing the decrease in generalization error or the probability of not increasing generalization error. The surprising result is that from both the Bayesian ..."
Abstract

Cited by 29 (7 self)
the Bayesian and frequentist perspectives this can yield the natural gradient direction. Although that direction can be very expensive to compute we develop an efficient, general, online approximation to the natural gradient descent which is suited to large scale problems. We report experimental results
Revisiting natural gradient for deep networks
 In International Conference on Learning Representations
, 2014
"... We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models. The contributions of this paper are as follows. We show the connection between natural gradient and three other recently proposed methods: HessianFree (Martens, 2010), Krylov Subspace Descent"
Abstract

Cited by 10 (5 self)
We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models. The contributions of this paper are as follows. We show the connection between natural gradient and three other recently proposed methods: HessianFree (Martens, 2010), Krylov Subspace Descent
Naturalgradient adaptation
 Unsupervised Adaptive Filtering, Vol. I: Blind Source Separation
, 2000
"aware multigrid for variable coefficient elliptic problems on"
Abstract

Cited by 19 (0 self)
aware multigrid for variable coefficient elliptic problems on
A new learning algorithm for blind signal separation

, 1996
"... A new online learning algorithm which minimizes a statistical dependency among outputs is derived for blind separation of mixed signals. The dependency is measured by the average mutual information (MI) of the outputs. The source signals and the mixing matrix are unknown except for the number of ..."
Abstract

Cited by 622 (80 self)
of the sources. The GramCharlier expansion instead of the Edgeworth expansion is used in evaluating the MI. The natural gradient approach is used to minimize the MI. A novel activation function is proposed for the online learning algorithm which has an equivariant property and is easily implemented on a neural
