Results 1  10
of
29
Representation learning: A review and new perspectives.
 of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract

Cited by 173 (4 self)
 Add to MetaCart
(Show Context)
AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Domain adaptation for largescale sentiment classification: A deep learning approach
 In Proceedings of the Twentyeight International Conference on Machine Learning, ICML
, 2011
"... The exponential increase in the availability of online reviews and recommendations makes sentiment classification an interesting topic in academic and industrial research. Reviews can span so many different domains that it is difficult to gather annotated training data for all of them. Hence, this p ..."
Abstract

Cited by 85 (7 self)
 Add to MetaCart
(Show Context)
The exponential increase in the availability of online reviews and recommendations makes sentiment classification an interesting topic in academic and industrial research. Reviews can span so many different domains that it is difficult to gather annotated training data for all of them. Hence, this paper studies the problem of domain adaptation for sentiment classifiers, hereby a system is trained on labeled reviews from one source domain but is meant to be deployed on another. We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. Sentiment classifiers trained with this highlevel feature representation clearly outperform stateoftheart methods on a benchmark composed of reviews of 4 types of Amazon products. Furthermore, this method scales well and allowed us to successfully perform domain adaptation on a larger industrialstrength dataset of 22 domains. 1.
ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
"... Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
(Show Context)
Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with offtheshelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve stateoftheart test accuracies on the STL10 and Hollywood2 datasets. 1
Practical recommendations for gradientbased training of deep architectures
 Neural Networks: Tricks of the Trade
, 2013
"... ar ..."
(Show Context)
Deep Learning of Representations for Unsupervised and Transfer Learning
 WORKSHOP ON UNSUPERVISED AND TRANSFER LEARNING
, 2012
"... Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higherlevel learned features defined in terms of lowerlevel features. The objective is to make these higherlevel representations more a ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
(Show Context)
Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higherlevel learned features defined in terms of lowerlevel features. The objective is to make these higherlevel representations more abstract, with their individual features more invariant to most of the variations that are typically present in the training distribution, while collectively preserving as much as possible of the information in the input. Ideally, we would like these representations to disentangle the unknown factors of variation that underlie the training distribution. Such unsupervised learning of representations can be exploited usefully under the hypothesis that the input distribution P (x) is structurally related to some task of interest, say predicting P (yx). This paper focuses on the context of the Unsupervised and Transfer Learning Challenge, on why unsupervised pretraining of representations can be useful, and how it can be exploited in the transfer learning scenario, where we care about predictions on examples that are not from the same distribution as the training distribution.
On Autoencoders and Score Matching for Energy Based Models
"... We consider estimation methods for the class of continuousdata energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularize ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
We consider estimation methods for the class of continuousdata energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularized autoencoder. We show how different Gaussian EBMs lead to different autoencoder architectures, providing deep links between these two families of models. We compare the score matching estimator for the mPoT model, a particular Gaussian EBM, to several other training methods on a variety of tasks including image denoising and unsupervised feature extraction. We show that the regularization function induced by score matching leads to superior classification performance relative to a standard autoencoder. We also show that score matching yields classification results that are indistinguishable from betterknown stochastic approximation maximum likelihood estimators. 1.
What regularized autoencoders learn from the data generating distribution
, 2012
"... What do autoencoders learn about the underlying data generating distribution? Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form o ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
What do autoencoders learn about the underlying data generating distribution? Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density. We show that the autoencoder captures the score (derivative of the logdensity with respect to the input). It contradicts previous interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the autoencoder: they show what the autoencoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising autoencoder training criterion with small corruption noise, but with contraction applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to maximum likelihood because it does not involve a partition function. Finally, we show how an approximate MetropolisHastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments. 1.
Deep AutoRegressive Networks
, 2014
"... We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL) principle, which can be seen as maximising a variational lower bound on the loglikelihood, with a feedforward neural network implementing approximate inference. We demonstrate stateoftheart generative performance on a number of classic data sets: several UCI data sets, MNIST and Atari 2600 games.
On the expressive power of deep architectures
 In ALT’2011
, 2011
"... Abstract. Deep architectures are families of functions corresponding to deep circuits. Deep Learning algorithms are based on parametrizing such circuits and tuning their parameters so as to approximately optimize some training objective. Whereas it was thought too difficult to train deep architectur ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
(Show Context)
Abstract. Deep architectures are families of functions corresponding to deep circuits. Deep Learning algorithms are based on parametrizing such circuits and tuning their parameters so as to approximately optimize some training objective. Whereas it was thought too difficult to train deep architectures, several successful algorithms have been proposed in recent years. We review some of the theoretical motivations for deep architectures, as well as some of their practical successes, and propose directions of investigations to address some of the remaining challenges. 1 Learning Artificial Intelligence An intelligent agent takes good decisions. In order to do so it needs some form of knowledge. Knowledge can be embodied into a function that maps inputs and states to states and actions. If we saw an agent that always took what one would consider as the good decisions, we would qualify the agent as intelligent. Knowledge can be explicit, as in the form of symbolically expressed rules and facts of expert systems, or in the form of linguistic statements in an encyclopedia.