### Citations

1533 | Gradient-based learning applied to document recognition
- LeCun, Bottou, et al.
- 1998
(Show Context)
Citation Context ...e been proposed previously (see Fukushima [2], for example), it has only been the past few years that have seen deep learning come into its own. Works begun by Hinton, Bengio, and LeCun (for example, =-=[3]-=-–[5]) have since been extended by many others, and a set of common characteristics defines an overall framework for deep learning. In particular, the word “deep” in this context refers to the fact tha... |

1162 |
A logical calculus of the ideas immanent in nervous activity
- McCulloch, Pitts
- 1943
(Show Context)
Citation Context ...lexities, and so we must rely on approximations to estimate them. Effectively all of the problems experienced over the history of connectionist networks, going all the way back to McCulloch and Pitts =-=[13]-=-, have been caused by the fact that finding the optimal set of connection weights is intractable. The history of the field has largely been a progression of ever improving gradient descent-based appro... |

850 | Training products of experts by minimizing contrastive divergence. Neural Computation,
- Hinton
- 2002
(Show Context)
Citation Context ...as found to be tractable, training was initially inefficient, and RBMs did not gain popularity for several years until Hinton et al. developed Contrastive Divergence, a method based on Gibbs Sampling =-=[16]-=-. Since then, RBMs are used widely as basic components of deep learning algorithms [17]–[19]. RBMs have also been successfully applied to classification tasks [20]–[22]. Moreover, RBMs have been appli... |

796 |
Reducing the Dimensionality of Data with Neural Networks.
- Hinton, Salakhutdinov
- 2006
(Show Context)
Citation Context ...ty for several years until Hinton et al. developed Contrastive Divergence, a method based on Gibbs Sampling [16]. Since then, RBMs are used widely as basic components of deep learning algorithms [17]–=-=[19]-=-. RBMs have also been successfully applied to classification tasks [20]–[22]. Moreover, RBMs have been applied to many other learning tasks such as Collaborative Filtering [23]. The Contrastive Diverg... |

468 |
Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.
- Fukushima
- 1980
(Show Context)
Citation Context ...that bias is critical to understanding and improving our learning algorithms. A. Deep Learning While techniques similar to modern deep learning algorithms have been proposed previously (see Fukushima =-=[2]-=-, for example), it has only been the past few years that have seen deep learning come into its own. Works begun by Hinton, Bengio, and LeCun (for example, [3]–[5]) have since been extended by many oth... |

408 |
Information processing in dynamical systems: Foundations of harmony theory. In
- Smolensky
- 1986
(Show Context)
Citation Context ...artitioning approach as a “deep feature extraction” method to help us learn better RBMs and to learn them faster as well. B. Restricted Boltzman Machines The RBM model was first proposed by Smolensky =-=[14]-=- in 1986. As a type of Hopfield Network, an RBM is a generative model with visible nodes (x) and hidden nodes (h) as shown in Figure 1. There are no dependencies between hidden nodes, or between visib... |

394 | Greedy layer-wise training of deep networks,”
- Bengio, Lamblin, et al.
- 2007
(Show Context)
Citation Context ...ularity for several years until Hinton et al. developed Contrastive Divergence, a method based on Gibbs Sampling [16]. Since then, RBMs are used widely as basic components of deep learning algorithms =-=[17]-=-–[19]. RBMs have also been successfully applied to classification tasks [20]–[22]. Moreover, RBMs have been applied to many other learning tasks such as Collaborative Filtering [23]. The Contrastive D... |

389 | Learning long-term dependencies with gradient descent is difficult.
- Bengio, Simard, et al.
- 1994
(Show Context)
Citation Context ...s were difficult to train, because of a credit assignment problem; standard error-backpropagation suffers from gradient diffusion if applied to a deep network, resulting in generally poor performance =-=[6]-=-. Most deep learning techniques now get around this problem by performing some form of “unsupervised pretraining,” which involves learning the weights to minimise reconstruction error for (unlabeled) ... |

336 |
Learning deep architectures for AI,” Foundations and Trends
- Bengio
- 2009
(Show Context)
Citation Context ...tive and negative phases are repeated k times before the parameters are updated. The CD-1 algorithm (i.e, Contrastive Divergence with one step) has proven to be sufficient for many applications [24], =-=[25]-=-. CD-k is rarely used, because resetting the Markov chain after each parameter update is inefficient (as the model has already changed [24]). As an alternative, Tieleman modified the Contrastive Diver... |

253 | Learning methods for generic object recognition with invariance to pose and lighting.
- LeCun, Huang, et al.
- 2004
(Show Context)
Citation Context ...pe have demonstrated good performance for a number of traditionally difficult tasks, many in the domain of computer vision. Some examples are handwritten character recognition [3], object recognition =-=[7]-=-, denoising [8], and re-construction of missing or obscured information [8]. There has also been some work attempting to derive a theory to explain the success of deep learning. Erhan and Bengio [9] s... |

229 | The need for biases in learning generalizations,
- Mitchell
- 1980
(Show Context)
Citation Context ...ed-RBM to achieve this performance. In so doing, we hope to expose part of the representation bias of deep, vector-partitioning approaches in general. As Tom Mitchell pointed out in his seminal paper =-=[1]-=-, not only is a bias necessary for learning, but examination of that bias is critical to understanding and improving our learning algorithms. A. Deep Learning While techniques similar to modern deep l... |

220 | Restricted boltzmann machines for collaborative filtering.
- Salakhutdinov, Mnih, et al.
- 2007
(Show Context)
Citation Context ...earning algorithms [17]–[19]. RBMs have also been successfully applied to classification tasks [20]–[22]. Moreover, RBMs have been applied to many other learning tasks such as Collaborative Filtering =-=[23]-=-. The Contrastive Divergence (CD) method provides a reasonable approximation to the likelihood gradient of the energy function. Algorithm 1 shows pseudocode for training RBMs using a one step Contrast... |

155 | Why does unsupervised pre-training help deep learning?
- Erhan
- 2010
(Show Context)
Citation Context ...n [7], denoising [8], and re-construction of missing or obscured information [8]. There has also been some work attempting to derive a theory to explain the success of deep learning. Erhan and Bengio =-=[9]-=- suggest that unsupervised pre-training acts as a regularizer, and we have suggested in previous work [10] that it also takes advantage of spatially local statistical information in the training data.... |

151 | Training restricted boltzmann machines using approximations to the likelihood gradient.
- Tieleman
- 2008
(Show Context)
Citation Context ...e positive and negative phases are repeated k times before the parameters are updated. The CD-1 algorithm (i.e, Contrastive Divergence with one step) has proven to be sufficient for many applications =-=[24]-=-, [25]. CD-k is rarely used, because resetting the Markov chain after each parameter update is inefficient (as the model has already changed [24]). As an alternative, Tieleman modified the Contrastive... |

99 | Classification using Discriminative Restricted Boltzmann Machines. - Larochelle, Bengio - 2008 |

60 |
Cressie, Statistics for Spatial Data
- C
- 1993
(Show Context)
Citation Context ... the same as, the correlation plots we use here). For a more in depth treatment of variograms, correlograms, and spatial statistics in general, the reader is directed to Cressie’s book on the subject =-=[30]-=-. IV. EXPERIMENTS We used the MNIST dataset for our experiments due to its wide use in evaluating RBMs as well as a variety of deep learning algorithms. The MNIST database (Mixed National Institute of... |

15 | Image denoising and inpainting with deep neural networks,”
- Xie, Xu, et al.
- 2012
(Show Context)
Citation Context ...rated good performance for a number of traditionally difficult tasks, many in the domain of computer vision. Some examples are handwritten character recognition [3], object recognition [7], denoising =-=[8]-=-, and re-construction of missing or obscured information [8]. There has also been some work attempting to derive a theory to explain the success of deep learning. Erhan and Bengio [9] suggest that uns... |

14 | Training restricted boltzmann machines on word observations.
- Dahl, Adams, et al.
- 2012
(Show Context)
Citation Context ...ce, a method based on Gibbs Sampling [16]. Since then, RBMs are used widely as basic components of deep learning algorithms [17]–[19]. RBMs have also been successfully applied to classification tasks =-=[20]-=-–[22]. Moreover, RBMs have been applied to many other learning tasks such as Collaborative Filtering [23]. The Contrastive Divergence (CD) method provides a reasonable approximation to the likelihood ... |

13 |
Neural abstraction pyramid: a hierarchical image understandingarchitecture
- Behnke, Rojas
- 1998
(Show Context)
Citation Context ...ectors before re-combining the analyzed data to form a representation of the full data vector. Convolutional Networks are the most well known of these, though there are several others, including [2], =-=[11]-=- and [12]. From a statistical and information-theoretic standpoint, this type of analysis seems like it should be highly detrimental to performance. After all, any statistical information relating two... |

13 |
The HTM Learning Algorithms,
- George, Jaros
- 2007
(Show Context)
Citation Context ...fore re-combining the analyzed data to form a representation of the full data vector. Convolutional Networks are the most well known of these, though there are several others, including [2], [11] and =-=[12]-=-. From a statistical and information-theoretic standpoint, this type of analysis seems like it should be highly detrimental to performance. After all, any statistical information relating two features... |

12 | Discovering Binary Codes for Documents by Learning Deep Generative Models. Topics in Cognitive Science, - Hinton, Salakhutdinov - 2010 |

4 |
Greedy layerwise training of deep belief networks
- Bengio, Lamblin, et al.
- 2007
(Show Context)
Citation Context ...en proposed previously (see Fukushima [2], for example), it has only been the past few years that have seen deep learning come into its own. Works begun by Hinton, Bengio, and LeCun (for example, [3]–=-=[5]-=-) have since been extended by many others, and a set of common characteristics defines an overall framework for deep learning. In particular, the word “deep” in this context refers to the fact that th... |

4 | Classification of sets using Restricted Boltzmann Machines
- Louradour, Larochelle
- 2011
(Show Context)
Citation Context ... method based on Gibbs Sampling [16]. Since then, RBMs are used widely as basic components of deep learning algorithms [17]–[19]. RBMs have also been successfully applied to classification tasks [20]–=-=[22]-=-. Moreover, RBMs have been applied to many other learning tasks such as Collaborative Filtering [23]. The Contrastive Divergence (CD) method provides a reasonable approximation to the likelihood gradi... |

3 | DOSI: Training artificial neural networks using overlapping swarm intelligence with local credit assignment
- Fortier, Sheppard, et al.
(Show Context)
Citation Context ...ase with random initialization. This enables the overall PartitionedRBM hierarchy to achieve higher performance over a given training interval than a single traditional RBM. 1Note that Fortier et al. =-=[27]-=-–[29] suggest an approach to enabling and exploiting overlap that would also permit parallelization and distribution of the networks being optimized. Fig. 2. Example partitioning approach for an RBM I... |

2 | A fast learning algortihm for deep belief nets - Hinton, Osindero, et al. - 2006 |

2 |
A student’s guide to entropy
- Lemons
- 2013
(Show Context)
Citation Context ... in Figure 1. There are no dependencies between hidden nodes, or between visible nodes; thus, an RBM forms a complete bipartite graph. This model can be represented as a Boltzmann energy distribution =-=[15]-=-, in which the probability distribution of the RBM is given as follows: p(x,h) = exp(−E(x,h)) Z where the partition function defines configurations over all possible x and h vectors Z = ∑ x,h exp(−E(x... |

2 |
The MNIST database of handwritten digits; Accessed
- LeCun, Cortes, et al.
- 2014
(Show Context)
Citation Context ... and are then centered in the final 28×28 image, resulting in a white border around every image. See Figure 4 for some sample images. The MNIST dataset was introduced in [3], and can be obtained from =-=[31]-=-. We measure the performance of the RBMs using reconstruction error, which is defined to be the mean difference between the original and reconstructed images. We used a binary reconstruction error, us... |

1 | Deep structure learning: Beyond connectionist approaches
- Mitchell, Sheppard
- 2012
(Show Context)
Citation Context ...e work attempting to derive a theory to explain the success of deep learning. Erhan and Bengio [9] suggest that unsupervised pre-training acts as a regularizer, and we have suggested in previous work =-=[10]-=- that it also takes advantage of spatially local statistical information in the training data. While some deep learning techniques (such as Deep Belief Networks) treat all the elements of an input vec... |

1 | Training restricted Boltzmann machines with overlapping partitions
- Tosun, Sheppard
(Show Context)
Citation Context ...ce using persistent Markov chains. II. PARTITIONED RBMS To improve the performance of RBMs, Tosun and Sheppard proposed a training method for RBMs that partitions the network into several subnetworks =-=[26]-=- that are trained independently, and incrementally re-combined until only a single, all-inclusive partition is left. With the partitioned RBM method, training involves several levels of partitioning a... |

1 | Abductive inference in bayesian networks using overlapping swarm intelligence - Fortier, Sheppard, et al. - 2014 |