Results 1 - 10
of
35
Learning and transferring mid-level image representations using convolutional neural networks
- In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2014
"... Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large-scale visual recognition challenge (ILSVRC2012). The suc-cess of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level f ..."
Abstract
-
Cited by 71 (3 self)
- Add to MetaCart
(Show Context)
Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large-scale visual recognition challenge (ILSVRC2012). The suc-cess of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level features used in other image classification meth-ods. Learning CNNs, however, amounts to estimating mil-lions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be effi-ciently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred rep-resentation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization. 1.
Learning everything about anything: Webly-supervised visual concept learning
- In CVPR
"... Figure 1: We introduce a fully-automated method that, given any concept, discovers an exhaustive vocabulary explaining all its appearance variations (i.e., actions, interactions, attributes, etc.), and trains full-fledged detection models for it. This figure shows a few of the many variations that o ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
(Show Context)
Figure 1: We introduce a fully-automated method that, given any concept, discovers an exhaustive vocabulary explaining all its appearance variations (i.e., actions, interactions, attributes, etc.), and trains full-fledged detection models for it. This figure shows a few of the many variations that our method has learned for four different classes of concepts: object (horse), scene (kitchen), event (Christmas), and action (walking). Recognition is graduating from labs to real-world ap-plications. While it is encouraging to see its potential being tapped, it brings forth a fundamental challenge to the vision researcher: scalability. How can we learn a model for any concept that exhaustively covers all its appearance varia-tions, while requiring minimal or no human supervision for compiling the vocabulary of visual variance, gathering the training images and annotations, and learning the models? In this paper, we introduce a fully-automated approach for learning extensive models for a wide range of variations (e.g. actions, interactions, attributes and beyond) within any concept. Our approach leverages vast resources of on-line books to discover the vocabulary of variance, and in-tertwines the data collection and modeling steps to alleviate the need for explicit human supervision in training the mod-els. Our approach organizes the visual knowledge about a concept in a convenient and useful way, enabling a variety of applications across vision and NLP. Our online system has been queried by users to learn models for several inter-esting concepts including breakfast, Gandhi, beautiful, etc. To date, our system has models available for over 50,000 variations within 150 concepts, and has annotated more than 10 million images with bounding boxes. 1.
Robust multi-resolution pedestrian detection in traffic scenes
- In CVPR
, 2013
"... The serious performance decline with decreasing resolu-tion is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian de-tection in different resolutions as different but related prob-lems, and propose a Multi-Task model to jointly consider their ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
(Show Context)
The serious performance decline with decreasing resolu-tion is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian de-tection in different resolutions as different but related prob-lems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains res-olution aware transformations to map pedestrians in differ-ent resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent proce-dure to learn the resolution aware transformations and de-formable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to sup-press them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method re-duces the mean miss rate to 60 % for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which no-ticeably outperforms previous state-of-the-art (71%). 1.
Beyond Dataset Bias: Multi-task Unaligned Shared Knowledge Transfer
"... Abstract. Many visual datasets are traditionally used to analyze the performance of different learning techniques. The evaluation is usually done within each dataset, therefore it is questionable if such results are a reliable indicator of true generalization ability. We propose here an algorithm to ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Many visual datasets are traditionally used to analyze the performance of different learning techniques. The evaluation is usually done within each dataset, therefore it is questionable if such results are a reliable indicator of true generalization ability. We propose here an algorithm to exploit the existing data resources when learning on a new multiclass problem. Our main idea is to identify an image representation that decomposes orthogonally into two subspaces: a part specific to each dataset, and a part generic to, and therefore shared between, all the considered source sets. This allows us to use the generic representation as un-biased reference knowledge for a novel classification task. By casting the method in the multi-view setting, we also make it possible to use different features for different databases. We call the algorithm MUST, Multitask Unaligned Shared knowledge Transfer. Through extensive experiments on five public datasets, we show that MUST consistently improves the cross-datasets generalization performance. 1
Continuous Manifold Based Adaptation for Evolving Visual Domains
"... We pose the following question: what happens when test data not only differs from training data, but differs from it in a continually evolving way? The classic domain adap-tation paradigm considers the world to be separated into stationary domains with clear boundaries between them. However, in many ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
We pose the following question: what happens when test data not only differs from training data, but differs from it in a continually evolving way? The classic domain adap-tation paradigm considers the world to be separated into stationary domains with clear boundaries between them. However, in many real-world applications, examples can-not be naturally separated into discrete domains, but arise from a continuously evolving underlying process. Exam-ples include video with gradually changing lighting and spam email with evolving spammer tactics. We formulate a novel problem of adapting to such continuous domains, and present a solution based on smoothly varying embeddings. Recent work has shown the utility of considering discrete visual domains as fixed points embedded in a manifold of lower-dimensional subspaces. Adaptation can be achieved via transforms or kernels learned between such stationary source and target subspaces. We propose a method to con-sider non-stationary domains, which we refer to as Con-tinuous Manifold Adaptation (CMA). We treat each target sample as potentially being drawn from a different subspace on the domain manifold, and present a novel technique for continuous transform-based adaptation. Our approach can learn to distinguish categories using training data collected at some point in the past, and continue to update its model of the categories for some time into the future, without re-ceiving any additional labels. Experiments on two visual datasets demonstrate the value of our approach for several popular feature representations. 1.
Inverting and visualizing features for object detection. arXiv
, 2012
"... This paper presents methods to visualize feature spaces commonly used in object detection. The tools in this paper allow a human to put on “feature space glasses ” and see the visual world as a computer might see it. We found that these “glasses ” allow us to gain insight into the behavior of comput ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
This paper presents methods to visualize feature spaces commonly used in object detection. The tools in this paper allow a human to put on “feature space glasses ” and see the visual world as a computer might see it. We found that these “glasses ” allow us to gain insight into the behavior of computer vision systems. We show a variety of experiments with our visualizations, such as examining the linear separability of recognition in HOG space, generating high scoring “super objects ” for an object detector, and diagnosing false positives. We pose the visualization problem as one of feature inversion, i.e. recovering the natural image that generated a feature descriptor. We describe four algorithms to tackle this task, with different trade-offs in speed, accuracy, and scalability. Our most successful algorithm uses ideas from sparse coding to learn a pair of dictionaries that enable regression between HOG features and natural images, and can invert features at interactive rates. We believe these visualizations are useful tools to add to an object detector researcher’s toolbox, and code is available. 1.
B.: Frustratingly easy nbnn domain adaptation
- In: Proc. ICCV (2013
"... Over the last years, several authors have signaled that state of the art categorization methods fail to perform well when trained and tested on data from different databases. The general consensus in the literature is that this issue, known as domain adaptation and/or dataset bias, is due to a distr ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Over the last years, several authors have signaled that state of the art categorization methods fail to perform well when trained and tested on data from different databases. The general consensus in the literature is that this issue, known as domain adaptation and/or dataset bias, is due to a distribution mismatch between data collections. Meth-ods addressing it go from max-margin classifiers to learn-ing how to modify the features and obtain a more robust representation. The large majority of these works use BOW feature descriptors, and learning methods based on image-to-image distance functions. Following the seminal work of [6], in this paper we chal-lenge these two assumptions. We experimentally show that using the NBNN classifier over existing domain adaptation databases achieves always very strong performances. We build on this result, and present an NBNN-based domain adaptation algorithm that learns iteratively a class metric while inducing, for each sample, a large margin separa-tion among classes. To the best of our knowledge, this is the first work casting the domain adaptation problem within the NBNN framework. Experiments show that our method achieves the state of the art, both in the unsupervised and semi-supervised settings. 1.
From virtual to reality: Fast adaptation of virtual object detectors to real domains
- BMVC
"... Abstract The most successful 2D object detection methods require a large number of images annotated with object bounding boxes to be collected for training. We present an alternative approach that trains on virtual data rendered from 3D models, avoiding the need for manual labeling. Growing demand ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Abstract The most successful 2D object detection methods require a large number of images annotated with object bounding boxes to be collected for training. We present an alternative approach that trains on virtual data rendered from 3D models, avoiding the need for manual labeling. Growing demand for virtual reality applications is quickly bringing about an abundance of available 3D models for a large variety of object categories. While mainstream use of 3D models in vision has focused on predicting the 3D pose of objects, we investigate the use of such freely available 3D models for multicategory 2D object detection. To address the issue of dataset bias that arises from training on virtual data and testing on real images, we propose a simple and fast adaptation approach based on decorrelated features. We also compare two kinds of virtual data, one rendered with real-image textures and one without. Evaluation on a benchmark domain adaptation dataset demonstrates that our method performs comparably to existing methods trained on large-scale real image domains.
Looking Beyond the Visible Scene
"... A common thread that ties together many prior works in scene understanding is their focus on the aspects directly present in a scene such as its categorical classification or the set of objects. In this work, we propose to look beyond the visible elements of a scene; we demonstrate that a scene is n ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
A common thread that ties together many prior works in scene understanding is their focus on the aspects directly present in a scene such as its categorical classification or the set of objects. In this work, we propose to look beyond the visible elements of a scene; we demonstrate that a scene is not just a collection of objects and their configuration or the labels assigned to its pixels- it is so much more. From a simple observation of a scene, we can tell a lot about the environment surrounding the scene such as the potential es-tablishments near it, the potential crime rate in the area, or even the economic climate. Here, we explore several of these aspects from both the human perception and computer vision perspective. Specifically, we show that it is possible to predict the distance of surrounding establishments such as McDonald’s or hospitals even by using scenes located far from them. We go a step further to show that both humans and computers perform well at navigating the environment based only on visual cues from scenes. Lastly, we show that it is possible to predict the crime rates in an area simply by looking at a scene without any real-time criminal activ-ity. Simply put, here, we illustrate that it is possible to look beyond the visible scene. 1.
permission. Towards Adapting ImageNet to Reality: Scalable Domain Adaptation with Implicit Low-rank Transformations
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific