Results 1 - 10
of
13
One-shot learning of object categories
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2006
"... Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advant ..."
Abstract
-
Cited by 364 (20 self)
- Add to MetaCart
(Show Context)
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.
Sharing Visual Features for Multiclass And Multiview Object Detection
, 2004
"... We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each clas ..."
Abstract
-
Cited by 279 (6 self)
- Add to MetaCart
(Show Context)
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects.
Weakly supervised scale-invariant learning of model for visual recognition,”
- IJCV,
, 2007
"... ..."
Efficient Visual Search for Objects in Videos
, 2008
"... We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google [9] retrieves web pages containing particular words, by ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google [9] retrieves web pages containing particular words, by specifying the query as an image of the object or scene. In our approach, each frame of the video is represented by a set of viewpoint invariant region descriptors. These descriptors enable recognition to proceed successfully despite changes in viewpoint, illumination, and partial occlusion. Vector quantizing these region descriptors provides a visual analogy of a word, which we term a "visual word." Efficient retrieval is then achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. The final ranking also depends on the spatial layout of the regions. Object retrieval results are reported on the full length feature films "Groundhog Day," "Charade," and "Pretty Woman," including searches from within the movie and also searches specified by external images downloaded from the Internet. We discuss three research directions for the presented video retrieval approach and review some recent work addressing them: 1) building visual vocabularies for very large-scale retrieval; 2) retrieval of 3-D objects; and 3) more thorough verification and ranking using the spatial structure of objects.
Clustering in a Boosting Framework
"... Abstract In this paper we present a novel approach for creating partitions of data space using simple clustering algorithms in a boosting framework. A general boosting algorithm for clustering tasks is proposed, and solutions for directly optimizing two loss functions according to this framework are ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract In this paper we present a novel approach for creating partitions of data space using simple clustering algorithms in a boosting framework. A general boosting algorithm for clustering tasks is proposed, and solutions for directly optimizing two loss functions according to this framework are obtained. Experimental results show how the performance of relatively simple and computationally efficient base clustering algorithms could be boosted using the proposed algorithm. 1
An Intelligent Model for Visual Scene Analysis and Compression IAJIT First Online Publication
, 2010
"... Abstract: This paper presents an improved approach for indicating visually salient regions of an image based upon a known visual search task. The proposed approach employs a robust model of instantaneous visual attention (i.e. “bottom-up”) combined with a pixel probability map derived from the autom ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract: This paper presents an improved approach for indicating visually salient regions of an image based upon a known visual search task. The proposed approach employs a robust model of instantaneous visual attention (i.e. “bottom-up”) combined with a pixel probability map derived from the automatic detection of a previously-seen object (task-dependent i.e. (“top-down”). The objects to be recognized are parameterized quickly in advance by a viewpoint-invariant spatial distribution of Speeded Up Robust Features (SURF) interest-points. The bottom-up and top-down object probability images are fused to produce a task-dependent saliency map. The proposed approach is validated using observer eye-tracker data collected under object search-and-count tasking. Proposed approach shows 13 % higher overlap with true attention areas under task compared to bottom-up saliency alone. The new combined saliency map is further used to develop a new intelligent compression technique which is an extension of Discrete Cosine Transform (DCT) encoding. The proposed approach is demonstrated on surveillance-style footage throughout.
Learning and Using Taxonomies for Visual and Olfactory Classification
, 2013
"... iii ..."
(Show Context)
Sixth Indian Conference on Computer Vision, Graphics & Image Processing Frequency Domain Visual Servoing using Planar Contours
"... Fourier domain methods have had a long association with geometric vision. In this paper, we introduce Fourier domain methods into the field of visual servoing for the first time. We show how different properties of Fourier transforms may be used to address specific issues in traditional visual servo ..."
Abstract
- Add to MetaCart
(Show Context)
Fourier domain methods have had a long association with geometric vision. In this paper, we introduce Fourier domain methods into the field of visual servoing for the first time. We show how different properties of Fourier transforms may be used to address specific issues in traditional visual servoing methods, giving rise to algorithms that are more flexible. Specifically, we demonstrate how Fourier analysis may be used to obtain straight camera paths in the Cartesian space, do path following and correspondenceless visual servoing. Most importantly, by introducing Fourier techniques, we set a framework into which robust Fourier based geometry processing algorithms may be incorporated to address the various issues in servoing. 1
IN PRESS, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Sharing
"... visual features for multiclass and multiview ..."
(Show Context)