Results 1 - 10
of
248
Online learning for matrix factorization and sparse coding
, 2010
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to ad ..."
Abstract
-
Cited by 330 (31 self)
- Add to MetaCart
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to adapt it to specific data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large data sets.
Learning mid-level features for recognition
, 2010
"... Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise tra ..."
Abstract
-
Cited by 228 (13 self)
- Add to MetaCart
(Show Context)
Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pooling step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pooling schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the relative importance of each step of mid-level feature extraction through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains state-of-the-art performance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature extractors, our approach aims to facilitate the design of better recognition architectures.
Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
"... Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers ..."
Abstract
-
Cited by 92 (34 self)
- Add to MetaCart
(Show Context)
Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be nonstationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches.
Convolutional Learning of Spatio-temporal Features
"... Abstract. We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a ..."
Abstract
-
Cited by 74 (4 self)
- Add to MetaCart
(Show Context)
Abstract. We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent “flow fields ” which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.
Structured Sparse Principal Component Analysis
, 2009
"... We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [1]. While ..."
Abstract
-
Cited by 70 (14 self)
- Add to MetaCart
(Show Context)
We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [1]. While classical sparse priors only deal with cardinality, the regularization we use encodes higher-order information about the data. We propose an efficient and simple optimization procedure to solve this problem. Experiments with two practical tasks, face recognition and the study of the dynamics of a protein complex, demonstrate the benefits of the proposed structured approach over unstructured approaches. 1
Deconvolutional networks
- In CVPR
, 2010
"... Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information which destroys cues such as edge intersections, parallelism and symmetry. We present a learning framework where features ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information which destroys cues such as edge intersections, parallelism and symmetry. We present a learning framework where features that capture these mid-level cues spontaneously emerge from image data. Our approach is based on the convolutional decomposition of images under a sparsity constraint and is totally unsupervised. By building a hierarchy of such decompositions we can learn rich feature sets that are a robust image representation for both the analysis and synthesis of images. 1.
Ask the locals: multi-way local pooling for image recognition
- in ICCV’11. IEEE
, 2011
"... ..."
(Show Context)
Learning convolutional feature hierarchies for visual recognition
- In Advances in
, 2010
"... We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in ..."
Abstract
-
Cited by 56 (1 self)
- Add to MetaCart
(Show Context)
We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall repre-sentation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasi-sparse features from the input. While patch-based training rarely produces any-thing but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross de-tectors, and oriented grating detectors. We show that using these filters in multi-stage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1
Learning Fast Approximations of Sparse Coding
"... In Sparse Coding (SC), input vectors are reconstructed using a sparse linear combination of basis vectors. SC has become a popular method for extracting features from data. For a given input, SC minimizes a quadratic reconstruction error with an L1 penalty term on the code. The process is often too ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
(Show Context)
In Sparse Coding (SC), input vectors are reconstructed using a sparse linear combination of basis vectors. SC has become a popular method for extracting features from data. For a given input, SC minimizes a quadratic reconstruction error with an L1 penalty term on the code. The process is often too slow for applications such as real-time pattern recognition. We proposed two versions of a very fast algorithm that produces approximate estimates of the sparse code that can be used to compute good visual features, or to initialize exact iterative algorithms. The main idea is to train a non-linear, feed-forward predictor with a specific architecture and a fixed depth to produce the best possible approximation of the sparse code. A version of the method, which can be seen as a trainable version of Li and Osher’s coordinate descent method, is shown to produce approximate solutions with 10 times less computation than Li and Osher’s for the same approximation error. Unlike previous proposals for sparse code predictors, the system allows a kind of approximate “explaining away ” to take place during inference. The resulting predictor is differentiable and can be included into globallytrained recognition systems. 1.
Group Sparse Coding
"... Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
(Show Context)
Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1