Results 1  10
of
17
Representation learning: A review and new perspectives.
 of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract

Cited by 173 (4 self)
 Add to MetaCart
(Show Context)
AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Distributed optimization of deeply nested systems
, 2012
"... In science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate c ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
In science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate complex mappings using simple stages. Each layer performs a different operation and achieves an ever more sophisticated representation of the input, as, for example, in an deep artificial neural network, an object recognition cascade in computer vision or a speech frontend processing. Joint estimation of the parameters of all the layers and selection of an optimal architecture is widely considered to be a difficult numerical nonconvex optimization problem, difficult to parallelize for execution in a distributed computation environment, and requiring significant human expert effort, which leads to suboptimal systems in practice. We describe a general mathematical strategy to learn the parameters and, to some extent, the architecture of nested systems, called the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penaltybased methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, and is competitive with stateoftheart nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.
A Hybrid Neural NetworkLatent Topic Model
"... This paper introduces a hybrid model that combines a neural network with a latent topic model. The neural network provides a lowdimensional embedding for the input data, whose subsequent distribution is captured by the topic model. The neural network thus acts as a trainable feature extractor while ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
This paper introduces a hybrid model that combines a neural network with a latent topic model. The neural network provides a lowdimensional embedding for the input data, whose subsequent distribution is captured by the topic model. The neural network thus acts as a trainable feature extractor while the topic model captures the group structure of the data. Following an initial pretraining phase to separately initialize each part of the model, a unified training scheme is introduced that allows for discriminative training of the entire model. The approach is evaluated on visual data in scene classification task, where the hybrid model is shown to outperform models based solely on neural networks or topic models, as well as other baseline methods. 1
In all likelihood, deep belief is not enough
 Journal of Machine Learning Research
, 2011
"... Statistical models of natural images provide an important tool for researchers in the fields of machine learning and computational neuroscience. The canonical measure to quantitatively assess and compare the performance of statistical models is given by the likelihood. One class of statistical model ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Statistical models of natural images provide an important tool for researchers in the fields of machine learning and computational neuroscience. The canonical measure to quantitatively assess and compare the performance of statistical models is given by the likelihood. One class of statistical models which has recently gained increasing popularity and has been applied to a variety of complex data is formed by deep belief networks. Analyses of these models, however, have often been limited to qualitative analyses based on samples due to the computationally intractable nature of their likelihood. Motivated by these circumstances, the present article introduces a consistent estimator for the likelihood of deep belief networks which is computationally tractable and simple to apply in practice. Using this estimator, we quantitatively investigate a deep belief network for natural image patches and compare its performance to the performance of other models for natural image patches. We find that the deep belief network is outperformed with respect to the likelihood even by very simple mixture models.
Differentiable pooling for hierarchical feature learning. Arxiv preprint arXiv:1207.0151
, 2012
"... We introduce a parametric form of pooling, based on a Gaussian, which can be optimized alongside the features in a single global objective function. By contrast, existing pooling schemes are based on heuristics (e.g. local maximum) and have no clear link to the cost function of the model. Furthermor ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We introduce a parametric form of pooling, based on a Gaussian, which can be optimized alongside the features in a single global objective function. By contrast, existing pooling schemes are based on heuristics (e.g. local maximum) and have no clear link to the cost function of the model. Furthermore, the variables of the Gaussian explicitly store location information, distinct from the appearance captured by the features, thus providing a what/where decomposition of the input signal. Although the differentiable pooling scheme can be incorporated in a wide range of hierarchical models, we demonstrate it in the context of a Deconvolutional Network model (Zeiler et al. [22]). We also explore a number of secondary issues within this model and present detailed experiments on MNIST digits. 1.
Optimizing Neural Networks that Generate Images
, 2014
"... Image recognition, also known as computer vision, is one of the most prominent applications of neural networks. The image recognition methods presented in this thesis are based on the reverse process: generating images. Generating images is easier than recognizing them, for the computer systems that ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Image recognition, also known as computer vision, is one of the most prominent applications of neural networks. The image recognition methods presented in this thesis are based on the reverse process: generating images. Generating images is easier than recognizing them, for the computer systems that we have today. This work leverages the ability to generate images, for the purpose of recognizing other images. One part of this thesis introduces a thorough implementation of this “analysis by synthesis ” idea in a sophisticated autoencoder. Half of the image generation system (namely the structure of the system) is hardcoded; the other half (the content inside that structure) is learned. At the same time as this image generation system is being learned, an accompanying image recognition system is learning to extract descriptions from images. Learning together, these two components develop an excellent understanding of the provided data. The second part of the thesis is an algorithm for training undirected generative models, by making use of a powerful interaction between training and a Markov Chain whose task is to produce samples from the model. This algorithm is shown to work well on image data, but is equally applicable to undirected generative models of other types of data.
An overview of deepstructured learning for information processing
 in Proc. AsianPacific Signal & Information Processing Annual Summit & Conference
, 2011
"... Abstract — In this paper, I will introduce to the APSIPA audience an emerging area of machine learning, deepstructured learning. It refers to a class of machine learning techniques, developed mostly since 2006, where many layers of information processing stages in hierarchical architectures are exp ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, I will introduce to the APSIPA audience an emerging area of machine learning, deepstructured learning. It refers to a class of machine learning techniques, developed mostly since 2006, where many layers of information processing stages in hierarchical architectures are exploited for pattern classification and for unsupervised feature learning. First, the brief history of deep learning is discussed. Then, I develop a classificatory scheme to analyze and summarize major work reported in the deep learning literature. Using this scheme, I provide a taxonomyoriented survey on the existing deep architectures, and categorize them into three types: generative, discriminative, and hybrid. Two prime deep architectures, one hybrid and one discriminative, are presented in detail. Finally, selected applications of deep learning are reviewed in broad areas of information processing including audio/speech, image/video, multimodality, language modeling, natural language processing, and information retrieval. I.
Data Insufficiency in Sketch Versus Photo Face Recognition
"... Computerized sketchface recognition is a crucial element for law enforcement and has received considerable attention in the recent literature. Sketches of the suspect are handdrawn or computerrendered based on a verbal description of the suspect. However, the most popular and the only publicly av ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Computerized sketchface recognition is a crucial element for law enforcement and has received considerable attention in the recent literature. Sketches of the suspect are handdrawn or computerrendered based on a verbal description of the suspect. However, the most popular and the only publicly available dataset, i.e. the CUFS facesketch dataset, is far from realistic because the sketches are handdrawn with the artist looking at the photographs to be matched later. After years of effort, researchers are producing nearly perfect results. However, we show that this is not because the problem is solved, but because of flaws in the dataset. In this paper, we empirically show that an offtheshelf face recognition system for photosketch and sketchphoto matching with simple shape and edge features outperforms more sophisticated stateoftheart approaches even without using training data. We additionally show that just using the hair region gives a 85.22 % recognition rate. Based on the empirical evidences we argue that the current dataset available for facesketch matching purposes is not appropriate and needs to be replaced by a more realistic one for advancement of this field. 1.
Generative Image Modeling Using Spatial LSTMs
 In Advances in Neural Information Processing Systems 28
, 2015
"... Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing longrange dependencies in a number of problems but only recently have found their way ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing longrange dependencies in a number of problems but only recently have found their way into generative image models. We here introduce a recurrent image model based on multidimensional long shortterm memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting. 1