Results 1 - 10
of
443
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
"... We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be repurposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks an ..."
Abstract
-
Cited by 203 (22 self)
- Add to MetaCart
(Show Context)
We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be repurposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient labeled or unlabeled data to conventionally train or adapt a deep architecture to the new tasks. We investigate and visualize the semantic clustering of deep convolutional features with respect to a variety of such tasks, including scene recognition, domain adaptation, and fine-grained recognition challenges. We compare the efficacy of relying on various network levels to define a fixed feature, and report novel results that significantly outperform the state-of-the-art on several important vision challenges. We are releasing DeCAF, an open-source implementation of these deep convolutional activation features, along with all associated network parameters to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms. 1.
The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization
, 2011
"... While vector quantization (VQ) has been applied widely to generate features for visual recognition problems, much recent work has focused on more powerful methods. In particular, sparse coding has emerged as a strong alternative to traditional VQ approaches and has been shown to achieve consistently ..."
Abstract
-
Cited by 149 (7 self)
- Add to MetaCart
While vector quantization (VQ) has been applied widely to generate features for visual recognition problems, much recent work has focused on more powerful methods. In particular, sparse coding has emerged as a strong alternative to traditional VQ approaches and has been shown to achieve consistently higher performance on benchmark datasets. Both approaches can be split into a training phase, where the system learns a dictionary of basis functions, and an encoding phase, where the dictionary is used to extract features from new inputs. In this work, we investigate the reasons for the success of sparse coding over VQ by decoupling these phases, allowing us to separate out the contributions of training and encoding in a controlled way. Through extensive experiments on CIFAR, NORB and Caltech 101 datasets, we compare several training and encoding schemes, including sparse coding and a form of VQ with a soft threshold activation function. Our results show not only that we can use fast VQ algorithms for training, but that we can just as well use randomly chosen exemplars from the training set. Rather than spend resources on training, we find it is more important to choose a good encoder—which can often be a simple feed forward non-linearity. Our results include state-of-the-art performance on both CIFAR and NORB.
Hashing with graphs
- In ICML
, 2011
"... Hashing is becoming increasingly popular for efficient nearest neighbor search in massive databases. However, learning short codes that yield good search performance is still a challenge. Moreover, in many cases realworld data lives on a low-dimensional manifold, which should be taken into account t ..."
Abstract
-
Cited by 108 (29 self)
- Add to MetaCart
Hashing is becoming increasingly popular for efficient nearest neighbor search in massive databases. However, learning short codes that yield good search performance is still a challenge. Moreover, in many cases realworld data lives on a low-dimensional manifold, which should be taken into account to capture meaningful nearest neighbors. In this paper, we propose a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes. To make such an approach computationally feasible, we utilize Anchor Graphs to obtain tractable low-rank adjacency matrices. Our formulation allows constant time hashingof a newdatapointbyextrapolatinggraphLaplacian eigenvectors to eigenfunctions. Finally, we describe a hierarchical threshold learning procedure in which each eigenfunction yields multiple bits, leading to higher search accuracy. Experimental comparison with the other state-of-the-art methods on two large datasets demonstrates the efficacy of the proposed method. 1.
Semantic segmentation with second-order pooling
- In ECCV
"... Abstract. Feature extraction, coding and pooling, are important components on many contemporary object recognition paradigms. In this paper we explore novel pooling techniques that encode the second-order statistics of local descriptors inside a region. To achieve this effect, we introduce multiplic ..."
Abstract
-
Cited by 91 (12 self)
- Add to MetaCart
(Show Context)
Abstract. Feature extraction, coding and pooling, are important components on many contemporary object recognition paradigms. In this paper we explore novel pooling techniques that encode the second-order statistics of local descriptors inside a region. To achieve this effect, we introduce multiplicative second-order analogues of average and maxpooling that together with appropriate non-linearities lead to state-ofthe-art performance on free-form region recognition, without any type of feature coding. Instead of coding, we found that enriching local descriptors with additional image information leads to large performance gains, especially in conjunction with the proposed pooling methodology. We show that second-order pooling over free-form regions produces results superior to those of the winning systems in the Pascal VOC 2011 semantic segmentation challenge, with models that are 20,000 times faster.
Learning A Discriminative Dictionary for Sparse Coding via Label Consistent K-SVD
"... A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse ..."
Abstract
-
Cited by 88 (9 self)
- Add to MetaCart
(Show Context)
A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistent constraint called ‘discriminative sparse-code error ’ and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single over-complete dictionary and an optimal linear classifier jointly. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse coding techniques for face and object category recognition under the same learning conditions. 1.
Kernel Descriptors for Visual Recognition
"... The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show th ..."
Abstract
-
Cited by 69 (13 self)
- Add to MetaCart
(Show Context)
The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet. 1
L.: Combining randomization and discrimination for fine-grained image categorization
- In: Proc CVPR (2011
"... In this paper, we study the problem of fine-grained image categorization. The goal of our method is to explore fine image statistics and identify the discriminative image patches for recognition. We achieve this goal by combining two ideas, discriminative feature mining and randomization. Discrimina ..."
Abstract
-
Cited by 66 (6 self)
- Add to MetaCart
(Show Context)
In this paper, we study the problem of fine-grained image categorization. The goal of our method is to explore fine image statistics and identify the discriminative image patches for recognition. We achieve this goal by combining two ideas, discriminative feature mining and randomization. Discriminative feature mining allows us to model the detailed information that distinguishes different classes of images, while randomization allows us to handle the huge feature space and prevents over-fitting. We propose a random forest with discriminative decision trees algorithm, where every tree node is a discriminative classifier that is trained by combining the information in this node as well as all upstream nodes. Our method is tested on both subordinate categorization and activity recognition datasets. Experimental results show that our method identifies semantically meaningful visual information and outperforms stateof-the-art algorithms on various datasets. 1.
Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds
"... Active learning and crowdsourcing are promising ways to efficiently build up training sets for object recognition, but thus far techniques are tested in artificially controlled settings. Typically the vision researcher has already determined the dataset’s scope, the labels “actively ” obtained are i ..."
Abstract
-
Cited by 64 (5 self)
- Add to MetaCart
(Show Context)
Active learning and crowdsourcing are promising ways to efficiently build up training sets for object recognition, but thus far techniques are tested in artificially controlled settings. Typically the vision researcher has already determined the dataset’s scope, the labels “actively ” obtained are in fact already known, and/or the crowd-sourced collection process is iteratively fine-tuned. We present an approach for live learning of object detectors, in which the system autonomously refines its models by actively requesting crowd-sourced annotations on images crawled from the Web. To address the technical issues such a large-scale system entails, we introduce a novel part-based detector amenable to linear classifiers, and show how to identify its most uncertain instances in sub-linear time with a hashingbased solution. We demonstrate the approach with experiments of unprecedented scale and autonomy, and show it successfully improves the state-of-the-art for the most challenging objects in the PASCAL benchmark. In addition, we show our detector competes well with popular nonlinear classifiers that are much more expensive to train. 1.
Ask the locals: multi-way local pooling for image recognition
- in ICCV’11. IEEE
, 2011
"... ..."
(Show Context)
Beyond spatial pyramids: Receptive field learning for pooled image features
- In CVPR
"... We examine the effect of receptive field designs on the classification accuracy in the commonly adopted pipeline of image classification. While existing al-gorithms use manually defined spatial regions for pooling, we argue that learn-ing more adaptive receptive field increases performance even with ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
(Show Context)
We examine the effect of receptive field designs on the classification accuracy in the commonly adopted pipeline of image classification. While existing al-gorithms use manually defined spatial regions for pooling, we argue that learn-ing more adaptive receptive field increases performance even with a significantly smaller codebook size at the coding layer. To this end, we adopt the idea of over-completeness by starting with a large number of receptive field candidates, and train a classifier that only uses a sparse subset of them. An efficient algorithm based on grafting is proposed to perform feature selection from a set of pooling candidates. With this method, we achieve the best published performance on the CIFAR-10 dataset, using a much lower dimensional feature space than previous methods did. 1