The Helmholtz machine is a new unsupervised learning architecture that uses topdown connections to build probability density models of input and and bottom up connections to build inverses to those models. The wake-sleep learning algorithm for the machine involves just the purely local delta rule. This paper suggests a number of different varieties of Helmholtz machines, each with its own strengths and weaknesses, and relates them to cortical information processing. 1
|
4388
|
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
– Pearl
- 1988
|
|
4345
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
885
|
Learning to predict by the methods of temporal differences
– Sutton
- 1988
|
|
751
|
Self organzed formation of topologically correct feature maps
– Kohonen
- 1982
|
|
483
|
The organization of behavior
– Hebb
- 1949
|
|
415
|
A maximization technique occurring in the statistical analysis of probabilistic function of Markov chains
– Baum, Petrie, et al.
- 1970
|
|
410
|
A New View of the EM Algorithm that Justifies Incremental and Other Variants“, Learning in Graphical Models
– Neal, Hinton
- 1993
|
|
375
|
Integrated Architectures for Learning, Planning and Reacting based on Approximate Dynamic Programming. Appeared
– Sutton
- 1990
|
|
323
|
Probabilistic inference using Markovchain Monte Carlo methods
– Neal
- 1993
|
|
322
|
A learning algorithm for continually running fully recurrent neural networks
– Williams, Zipser
- 1990
|
|
302
|
A massively parallel architecture for a self-organizing neural pattern recognition machine
– Carpenter, Grossberg
- 1987
|
|
254
|
Factorial hidden markov models
– Ghahramani, Jordan
- 1997
|
|
211
|
Learning and relearning in Boltzmann machines
– Hinton, Sejnowski
- 1986
|
|
189
|
Learning and sequential decision making
– Barto, Sutton, et al.
- 1990
|
|
182
|
The wake-sleep algorithm for unsupervised neural networks
– Hinton, Dayan, et al.
- 1995
|
|
174
|
Simple statistical gradient-following algorithms for connectionist reinforcement learning
– Williams
- 1992
|
|
171
|
Unsupervised learning
– Barlow
- 1989
|
|
155
|
An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology
– Baum, Eagon
- 1967
|
|
154
|
The Helmholtz machine
– Dayan, Hinton, et al.
- 1995
|
|
150
|
Connectionist learning of belief networks
– Neal
- 1992
|
|
133
|
Adaptive signal processing
– Widrow, Stearns
- 1985
|
|
124
|
Supervised learning from incomplete data via an EM approach
– Ghahramani, Jordan
- 1994
|
|
119
|
A mean field theory learning algorithm for neural networks
– Peterson, Anderson
- 1987
|
|
114
|
Hinton,“Self-organizing neural network that discovers surfaces in random-dot stereograms
– Becker, E
- 1992
|
|
102
|
Mean field theory for sigmoid belief networks
– Saul, Jaakkola, et al.
- 1996
|
|
101
|
An emergent model of orientation selectivity in cat visual cortical simple cells
– Somers, Nelson, et al.
- 1995
|
|
94
|
An Introduction to Latent Variable Models
– Everitt
- 1984
|
|
93
|
Keeping neural networks simple by minimizing the description length of the weights
– Hinton, D
- 1993
|
|
93
|
On convergence properties of the EM algorithm for gaussian mixtures
– Xu, Jordan
- 1996
|
|
91
|
Autoencoders, minimum description length and Helmholtz free energy
– Hinton, Zemel
- 1994
|
|
87
|
Theory of orientation tuning in visual cortex
– Ben-Yishai, RL, et al.
- 1995
|
|
80
|
Convergence results for the EM approach to mixtures of experts architectures
– Jordan, Xu
- 1995
|
|
79
|
Backpropagation: the basic theory
– Rumelhart, Durbin, et al.
- 1995
|
|
72
|
Learning internal representations by back propagation
– Rumelhart, Hinton, et al.
- 1986
|
|
69
|
EM algorithms for ML factor analysis
– Rubin, Thayer
- 1982
|
|
66
|
A multiple cause mixture model for unsupervised learning
– Saund
- 1995
|
|
57
|
Competition and multiple cause models
– Dayan, Zemel
- 1995
|
|
55
|
Boltzmann chains and hidden Markov models
– Saul, Jordan
- 1995
|
|
53
|
Neuronal architectures for pattern-theoretic
– Mumford
- 1994
|
|
52
|
Pattern–recognizing stochastic learning automata’, in
– Barto, Anandan
- 1985
|
|
48
|
Distributed hierarchical processing in primate cerebral cortex
– DJ, DC
- 1991
|
|
48
|
Deterministic Boltzmann learning performs steepest descent
– Hinton
- 1989
|
|
43
|
Training neural networks with deficient data
– Tresp, Ahmad, et al.
- 1994
|
|
40
|
Different voltage-dependent thresholds for inducing long-term depression and long-term potentiation in slices of rat visual cortex. Nature 347:69–72
– Artola, Bröcher, et al.
- 1990
|
|
34
|
A Fixed Size Storage O(n ) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks
– Schmidhuber
- 1992
|
|
29
|
Integrated segmentation and recognition of hand-printed numerals
– Keeler, Rumelhart, et al.
- 1991
|
|
25
|
The epistemological problem for automata
– MacKay
- 1956
|
|
24
|
Fast learning by bounding likelihoods in sigmoid type belief networks
– Jaakkola, Saul, et al.
- 1996
|
|
22
|
A theory of Pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement
– Rescorla, Wagner
- 1972
|
|
19
|
A forward-inverse optics model of reciprocal connections between visual cortical areas
– Kawato, Hayakama, et al.
- 1993
|