Results 1  10
of
14
Representation learning: A review and new perspectives.
 of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract

Cited by 173 (4 self)
 Add to MetaCart
(Show Context)
AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
The Manifold Tangent Classifier
"... We combine three important ideas present in previous work for building classifiers: the semisupervised hypothesis (the input distribution contains information about the classifier), the unsupervised manifold hypothesis (data density concentrates near lowdimensional manifolds), and the manifold hyp ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
(Show Context)
We combine three important ideas present in previous work for building classifiers: the semisupervised hypothesis (the input distribution contains information about the classifier), the unsupervised manifold hypothesis (data density concentrates near lowdimensional manifolds), and the manifold hypothesis for classification (different classes correspond to disjoint manifolds separated by low density). We exploit a novel algorithm for capturing manifold structure (highorder contractive autoencoders) and we show how it builds a topological atlas of charts, each chart being characterized by the principal singular vectors of the Jacobian of a representation mapping. This representation learning algorithm can be stacked to yield a deep architecture, and we combine it with a domain knowledgefree version of the TangentProp algorithm to encourage the classifier to be insensitive to local directions changes along the manifold. Recordbreaking classification results are obtained. 1
Estimating the Hessian by Backpropagating Curvature
"... In this work we develop Curvature Propagation (CP), a general technique for efficiently computing unbiased approximations of the Hessian of any function that is computed using a computational graph. At the cost of roughly two gradient evaluations, CP can give a rank1 approximation of the whole Hess ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
In this work we develop Curvature Propagation (CP), a general technique for efficiently computing unbiased approximations of the Hessian of any function that is computed using a computational graph. At the cost of roughly two gradient evaluations, CP can give a rank1 approximation of the whole Hessian, and can be repeatedly applied to give increasingly precise unbiased estimates of any or all of the entries of the Hessian. Of particular interest is the diagonal of the Hessian, for which no general approach is known to exist that is both efficient and accurate. We show in experiments that CP turns out to work well in practice, giving very accurate estimates of the Hessian of neural networks, for example, with a relatively small amount of work. We also apply CP to Score Matching, where a diagonal of a Hessian plays an integral role in the Score Matching objective, and where it is usually computed exactly using inefficient algorithms which do not scale to larger and more complex models. 1.
Implicit density estimation by local moment matching to sample from autoencoders
, 2012
"... Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of the unknown data generating density. This paper contributes to the mathematical understanding of this phenomenon and helps define better justified sampling algorithms for deep learning ba ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of the unknown data generating density. This paper contributes to the mathematical understanding of this phenomenon and helps define better justified sampling algorithms for deep learning based on autoencoder variants. We consider an MCMC where each step samples from a Gaussian whose mean and covariance matrix depend on the previous state, defines through its asymptotic distribution a target density. First, we show that good choices (in the sense of consistency) for these mean and covariance functions are the local expected value and local covariance under that target density. Then we show that an autoencoder with a contractive penalty captures estimators of these local moments in its reconstruction function and its Jacobian. A contribution of this work is thus a novel alternative to maximumlikelihood density estimation, which we call local moment matching. It also justifies a recently proposed sampling algorithm for the Contractive AutoEncoder and extends it to the Denoising AutoEncoder. 1
Big Neural Networks Waste Capacity
 In International Conference on Learning Representations
, 2013
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Département d’informatique et de recherche opérationnelle
TwoLayer Contractive Encodings for Learning Stable Nonlinear Features
, 2014
"... Unsupervised learning of feature hierarchies is often a good strategy to initialize deep architectures for supervised learning. Most existing deep learning methods build these feature hierarchies layer by layer in a greedy fashion using either autoencoders or restricted Boltzmann machines. Both yi ..."
Abstract
 Add to MetaCart
Unsupervised learning of feature hierarchies is often a good strategy to initialize deep architectures for supervised learning. Most existing deep learning methods build these feature hierarchies layer by layer in a greedy fashion using either autoencoders or restricted Boltzmann machines. Both yield encoders which compute linear projections of input followed by a smooth thresholding function. In this work, we demonstrate that these encoders fail to find stable features when the required computation is in the exclusiveor class. To overcome this limitation, we propose a twolayer encoder which is less restricted in the type of features it can learn. The proposed encoder is regularized by an extension of previous work on contractive regularization. This proposed twolayer contractive encoder potentially poses a more difficult optimization problem, and we further propose to linearly transform hidden neurons of the encoder to make learning easier. We demonstrate the advantages of the twolayer encoders qualitatively on artificially constructed datasets as well as commonly used benchmark datasets. We also conduct experiments on a semisupervised learning task and show the benefits of the proposed twolayer encoders trained with the linear transformation of perceptrons.