Results 1  10
of
17
Representation learning: A review and new perspectives.
 of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract

Cited by 173 (4 self)
 Add to MetaCart
(Show Context)
AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Generalized denoising autoencoders as generative models
 In Advances in Neural Information Processing Systems 26 (NIPS’13
, 2013
"... Département d’informatique et recherche opérationnelle ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
(Show Context)
Département d’informatique et recherche opérationnelle
Deep Generative Stochastic Networks Trainable by Backprop
, 2013
"... We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Be ..."
Abstract

Cited by 28 (7 self)
 Add to MetaCart
We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining. 1
Beating the perils of nonconvexity: Guaranteed training of neural networks using tensor methods. CoRR abs/1506.08473,
, 2015
"... Abstract Training neural networks is a challenging nonconvex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for training a twolayer neural network. We provide risk bounds for our prop ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract Training neural networks is a challenging nonconvex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for training a twolayer neural network. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NPhard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild nondegeneracy conditions. It consists of simple embarrassingly parallel linear and multilinear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with general nonlinear activations.
On the equivalence between deep nade and generative stochastic networks
, 2014
"... Abstract. Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for P (x). This issue has been recent ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for P (x). This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion that stochastically maximizes P (x) with all possible orders of factorizations. Unfortunately, ancestral sampling from deep NADE is very expensive, corresponding to running through a neural net separately predicting each of the visible variables given some others. This work makes a connection between this criterion and the training criterion for Generative Stochastic Networks (GSNs). It shows that training NADEs in this way also trains a GSN, which defines a Markov chain associated with the NADE model. Based on this connection, we show an alternative way to sample from a trained Orderless NADE that allows to tradeoff computing time and quality of the samples: a 3 to 10fold speedup (taking into account the waste due to correlations between consecutive samples of the chain) can be obtained without noticeably reducing the quality of the samples. This is achieved using a novel sampling procedure for GSNs called annealed GSN sampling, similar to tempering methods that combines fast mixing (obtained thanks to steps at high noise levels) with accurate samples (obtained thanks to steps at low noise levels). 1
Zerobias autoencoders and the benefits of coadapting features
, 2014
"... We show that training common regularized autoencoders resembles clustering, because it amounts to fitting a density model whose mass is concentrated in the directions of the individual weight vectors. We then propose a new activation function based on thresholding a linear function with zero bias ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We show that training common regularized autoencoders resembles clustering, because it amounts to fitting a density model whose mass is concentrated in the directions of the individual weight vectors. We then propose a new activation function based on thresholding a linear function with zero bias (so it is truly linear not affine), and argue that this allows hidden units to “collaborate ” in order to define larger regions of uniform density. We show that the new activation function makes it possible to train autoencoders without an explicit regularization penalty, such as sparsification, contraction or denoising, by simply minimizing reconstruction error. Experiments in a variety of recognition tasks show that zerobias autoencoders perform about on par with common regularized autoencoders on low dimensional data and outperform these by an increasing margin as the dimensionality of the data increases. 1.
Multimodal transitions for generative stochastic networks. arXiv preprint arXiv:1312.5578
, 2013
"... Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data g ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution. The result of training is therefore a machine that generates samples through this Markov chain. However, the previously introduced GSN consistency theorems suggest that in order to capture a wide class of distributions, the transition operator in general should be multimodal, something that has not been done before this paper. We introduce for the first time multimodal transition distributions for GSNs, in particular using models in the NADE family (Neural Autoregressive Density Estimator) as output distributions of the transition operator. A NADE model is related to an RBM (and can thus model multimodal distributions) but its likelihood (and likelihood gradient) can be computed easily. The parameters of the NADE are obtained as a learned function of the previous state of the learned Markov chain. Experiments clearly illustrate the advantage of such multimodal transition distributions over unimodal GSNs. 1
Deeply coupled autoencoder networks for crossview classification. arXiv preprint arXiv:1402.2031
, 2014
"... The comparison of heterogeneous samples extensively exists in many applications, especially in the task of image classification. In this paper, we propose a simple but effective coupled neural network, called Deeply Coupled Autoencoder Networks (DCAN), which seeks to build two deep neural networks, ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The comparison of heterogeneous samples extensively exists in many applications, especially in the task of image classification. In this paper, we propose a simple but effective coupled neural network, called Deeply Coupled Autoencoder Networks (DCAN), which seeks to build two deep neural networks, coupled with each other in every corresponding layers. In DCAN, each deep structure is developed via stacking multiple discriminative coupled autoencoders, a denoising autoencoder trained with maximum margin criterion consisting of intraclass compactness and interclass penalty. This single layer component makes our model simultaneously preserve the local consistency and enhance its discriminative capability. With increasing number of layers, the coupled networks can gradually narrow the gap between the two views. Extensive experiments on crossview image classification tasks demonstrate the superiority of our method over stateoftheart methods. 1
How autoencoders could provide credit assignment in deep networks via target propagation
, 2014
"... We propose to exploit reconstruction as a layerlocal training signal for deep learning. Reconstructions can be propagated in a form of target propagation playing a role similar to backpropagation but helping to reduce the reliance on derivatives in order to perform credit assignment across many l ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We propose to exploit reconstruction as a layerlocal training signal for deep learning. Reconstructions can be propagated in a form of target propagation playing a role similar to backpropagation but helping to reduce the reliance on derivatives in order to perform credit assignment across many levels of possibly strong nonlinearities (which is difficult for backpropagation). A regularized autoencoder tends produce a reconstruction that is a more likely version of its input, i.e., a small move in the direction of higher likelihood. By generalizing gradients, target propagation may also allow to train deep networks with discrete hidden units. If the autoencoder takes both a representation of input and target (or of any side information) in input, then its reconstruction of input representation provides a target towards a representation that is more likely, conditioned on all the side information. A deep autoencoder decoding path generalizes gradient propagation in a learned way that can could thus handle not just infinitesimal changes but larger, discrete changes, hopefully allowing credit assignment through a long chain of nonlinear operations. In addition to each layer being a good autoencoder, the encoder also learns to please the upper layers by transforming the data into a space where it is easier to model by them, flattening manifolds and disentangling factors. The motivations and theoretical justifications for this approach are laid down in this paper, along with conjectures that will have to be verified either mathematically or experimentally, including a hypothesis stating that such autoencoder mediated target propagation could play in brains the role of credit assignment through many nonlinear, noisy and discrete transformations. 1
0 Many Regression Algorithms, One Unified Model – A Review
"... This is a preprint from 23.04.2015, and differs from the final published version. c©2015. This manuscript version is made available under the CCBYNCND 4.0 license ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This is a preprint from 23.04.2015, and differs from the final published version. c©2015. This manuscript version is made available under the CCBYNCND 4.0 license