Download:
|
by Masa-aki Sato
ATR Human Information Processing Research Laboratories
http://www.hip.atr.co.jp/~masaaki/fast_em.ps.gz
Add To MetaCart
Abstract:
In this article, an on-line EM algorithm is derived for general Exponential Family models with Hidden variables (EFH models). It is proven that the on-line EM algorithm is equivalent to a stochastic gradient method with the inverse of the Fisher information matrix as a coefficient matrix. As a result, the stochastic approximation theory guarantees the convergence to a local maximum of the likelihood function. The performance of the on-line EM algorithm is examined by using the mixture of Gaussian model, which is a special type of the EFH model. The simulation results show that the on-line EM algorithm is much faster than the batch EM algorithm and the on-line gradient ascent algorithm. The fast learning speed is achieved by the systematic design of the learning rate schedule. Moreover, it is shown that the on-line EM algorithm can escape from a local maximum of the likelihood function in the early training phase, even when the batch EM algorithm is trapped to a local
Citations
|
4344
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
3051
|
Neural Networks for Pattern Recognition
– Bishop
- 1995
|
|
593
|
Hierarchical Mixtures of Experts and the EM algorithm
– Jordan, Jacobs
- 1993
|
|
569
|
Adaptive Mixture of Local Experts
– Jacobs, Jordan, et al.
- 1991
|
|
410
|
A New View of the EM Algorithm that Justifies Incremental and Other Variants“, Learning in Graphical Models
– Neal, Hinton
- 1993
|
|
252
|
A stochastic approximation method
– Robbins, Munro
- 1951
|
|
173
|
Stochastic Approximation Algorithms and Applications
– Kushner, Yin
- 1997
|
|
79
|
Information geometry of the EM and em algorithms for neural networks
– Amari
- 1995
|
|
77
|
A theory of adaptive pattern classifiers
– Amari
- 1967
|
|
75
|
Soft Competitive Adaptation: Neural Network Learning Algorithms based on Fitting Statistical Mixtures
– Nowlan
- 1991
|
|
49
|
An alternative model for mixtures of experts
– Xu, Jordan, et al.
- 1995
|
|
41
|
Natural gradient works efficiently
– Amari
- 1998
|
|
32
|
On-line em algorithm for the normalized gaussian network
– Sato, Ishii
- 2000
|
|
23
|
Differential Geometrical Methods in Statistics
– Amari
- 1985
|
|
8
|
Online learning and stochastic approximations
– Bottou
- 1998
|
|
8
|
Statistical study on on-line learning
– Murata
- 1999
|