• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Multiclass object recognition with sparse, localized features (2006)

by J Mutch, D G Lowe
Venue:Proc. CVPR IEEE
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 196
Next 10 →

Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

by Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng - IN ICML’09 , 2009
"... ..."
Abstract - Cited by 369 (19 self) - Add to MetaCart
Abstract not found

What is the Best Multi-Stage Architecture for Object Recognition?

by Kevin Jarrett, Koray Kavukcuoglu, Yann Lecun
"... In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filter ..."
Abstract - Cited by 252 (22 self) - Add to MetaCart
In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode. This paper addresses three questions: 1. How does the non-linearities that follow the filter banks influence the recognition accuracy? 2. does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3. Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks. We show that two stages of feature extraction yield better accuracy than one. Most surprisingly, we show that a two-stage system with random filters can yield almost 63 % recognition rate on Caltech-101, provided that the proper non-linearities and pooling layers are used. Finally, we show that with supervised refinement, the system achieves state-of-the-art performance on NORB dataset (5.6%) and unsupervised pre-training followed by supervised refinement produces good accuracy on Caltech-101 (> 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (0.53%). 1.
(Show Context)

Citation Context

...ient descent [10], convolutional networks trained in supervised mode with an auxiliary task [3], or trained in purely unsupervised mode [25, 11, 18]. Multi-stage systems also include HMAX-type models =-=[28, 22]-=- in which the first layer is hardwired with Gabor filters, and the second layer is trained in unsupervised mode by storing randomlypicked output configurations from the first stage into filters of the...

Beyond sliding windows: Object localization by efficient subwindow search

by Christoph H. Lampert, Matthew B. Blaschko, Thomas Hofmann - In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR , 2008
"... Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, be ..."
Abstract - Cited by 224 (11 self) - Add to MetaCart
Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, because the classifier function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branchand-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages. It converges to a globally optimal solution typically in sublinear time. We show how our method is applicable to different object detection and retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the χ 2-distance. We demonstrate state-of-the-art performance of the resulting systems on the UIUC Cars dataset, the PASCAL VOC 2006 dataset and in the PASCAL VOC 2007 competition. 1.
(Show Context)

Citation Context

... 4 × 4 pyramid 1.5 % 7.9 % ESS w/ bag-of-visual-words 10.0 % 71.2 % Agarwal et al. [1] 23.5 % 60.4 % Fergus et al. [10] 11.5 % — Leibe et al. [15] 2.5 % 5.0% Fritz et al. [12] 11.4 % 12.2% Mutch/Lowe =-=[18]-=- 0.04 % 9.4% Table 2. Error rates on UIUC Cars dataset at the point of equal precision and recall. the finer pyramid levels, regions of specific characteristics form, e.g. the wheels becomes very disc...

Unsupervised learning of invariant feature hierarchies with application to object recognition.” CVPR, 2007. 1 Data Driven HMC Algorithm. DDHMC (motion-based proposals) 1: Initialize chain with τo 2: for i = 1 to nsamples do 3: // 1. Data-Driven: Get Propo

by Fu-jie Huang, Y-lan Boureau, Yann Lecun - Initialize the Acceptance, H(qo, po), and the Proposal, H ′ (qo, po ) Hamiltonians , τq) 14: po = DMotion(τ ′ i , τq) 15: qo = DF orm(τ ′ i , τq) 16: draw po ∼ N (0, 1) 17: // 2. Perturbation on H ′ using Leapfrog 18: for j=1 to l do 13: qo = DF orm(τ ′ i
"... We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a pointwise sigmoid non-linearity, and a feature-pooling layer that compute ..."
Abstract - Cited by 195 (17 self) - Add to MetaCart
We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a pointwise sigmoid non-linearity, and a feature-pooling layer that computes the max of each filter output within adjacent windows. A second level of larger and more invariant features is obtained by training the same algorithm on patches of features from the first level. Training a supervised classifier on these features yields 0.64 % error on MNIST, and 54 % average recognition rate on Caltech 101
(Show Context)

Citation Context

...iple such levels are stacked, the resulting architecture is essentially identical to the Neocognitron [7], the Convolutional Network [13, 10], and the HMAX, or so-called “Standard Model” architecture =-=[20, 17]-=-. All of those models use alternating layers of convolutional feature detectors (reminiscent of Hubel and Wiesel’s simple cells), and local pooling and subsampling of feature maps using a max or an av...

Class-specific hough forests for object detection

by Juergen Gall, Victor Lempitsky - In Proceedings IEEE Conference Computer Vision and Pattern Recognition , 2009
"... We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locatio ..."
Abstract - Cited by 151 (18 self) - Add to MetaCart
We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locations of the centroid of the whole object; the detection hypotheses then correspond to the maxima of the Hough image that accumulates the votes from all parts. However, whereas the previous methods detect object parts using generative codebooks of part appearances, we take a more discriminative approach to object part detection. Towards this end, we train a class-specific Hough forest, which is a random forest that directly maps the image patch appearance to the probabilistic vote about the possible location of the object centroid. We demonstrate that Hough forests improve the results of the Hough-transform object detection significantly and achieve state-of-the-art performance for several classes and datasets. 1.
(Show Context)

Citation Context

...h-based methods Implicit Shape Model [10] 91% – ISM+verification [10] 97.5% 95% Boundary Shape Model [17] 85% – Random forest based method LayoutCRF [27] 93% – State-of-the-art Mutch and Lowe CVPR’06 =-=[15]-=- 99.9% 90.6% Lampert et al. CVPR’08 [9] 98.5% 98.6% Our approach Hough Forest 98.5% 98.6% HF - Weaker supervision 94.4% – Table 1. Performance of different methods on the two UIUC car datasets at reca...

Learning globally-consistent local distance functions for shape-based image retrieval and classification

by Andrea Frome, Fei Sha, Yoram Singer, Jitendra Malik - In ICCV , 2007
"... We address the problem of visual category recognition by learning an image-to-image distance function that attempts to satisfy the following property: the distance between images from the same category should be less than the distance between images from different categories. We use patch-based feat ..."
Abstract - Cited by 149 (3 self) - Add to MetaCart
We address the problem of visual category recognition by learning an image-to-image distance function that attempts to satisfy the following property: the distance between images from the same category should be less than the distance between images from different categories. We use patch-based feature vectors common in object recognition work as a basis for our image-to-image distance functions. Our large-margin formulation for learning the distance functions is similar to formulations used in the machine learning literature on distance metric learning, however we differ in that we learn local distance functions— a different parameterized function for every image of our training set—whereas typically a single global distance function is learned. This was a novel approach first introduced in Frome, Singer, & Malik, NIPS 2006. In that work we learned the local distance functions independently, and the outputs of these functions could not be compared at test time without the use of additional heuristics or training. Here we introduce a different approach that has the advantage that it learns distance functions that are globally consistent in that they can be directly compared for purposes of retrieval and classification. The output of the learning algorithm are weights assigned to the image features, which is intuitively appealing in the computer vision setting: some features are more salient than others, and which are more salient depends on the category, or image, being considered. We train and test using the Caltech 101 object recognition benchmark. Using fifteen training images per category, we achieved a mean recognition rate of 63.2 % and
(Show Context)

Citation Context

...ecognition across categories. Since then, there has been great improvements in recognition performance on the 2004 benchmark, with most algorithms making use of some variant of geometric blur or SIFT =-=[1, 25, 11, 13, 9, 8, 16, 19]-=-. Of this work, [9], [13], and [8] focused specifically on defining good imageto-image kernel functions over sets of patch-based features for use with support vector machines (SVMs). In the first two ...

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization

by Christoph H. Lampert, Matthew B. Blaschko, Thomas Hofmann - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To estimate the object’s location one can take a sliding window approach, but this strongly increases the computational ..."
Abstract - Cited by 120 (9 self) - Add to MetaCart
Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To estimate the object’s location one can take a sliding window approach, but this strongly increases the computational cost, because the classifier or similarity function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branch and bound scheme that allows efficient maximization of a large class of quality functions over all possible subimages. It converges to a globally optimal solution typically in linear or even sublinear time, in constrast to the quadratic scaling of exhaustive or sliding window search. We show how our method is applicable to different object detection and image retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the χ²-distance. We demonstrate state-of-the-art localization performance of the resulting systems on the

Learning Invariant Features through Topographic Filter Maps

by Koray Kavukcuoglu, Rob Fergus, Yann Lecun
"... Several recently-proposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often compos ..."
Abstract - Cited by 119 (20 self) - Add to MetaCart
Several recently-proposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a non-linear transform, such as a point-wise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locally-invariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited. 1.
(Show Context)

Citation Context

... is the feature extractor. Much of the recent proposals for object recognition systems are based on feature descriptors extracted from local patches placed at regularlyspaced grid-points on the image =-=[13, 11, 25, 18, 22]-=-. The most successful and most commonly-used descriptors such as SIFT and HoG [15, 3] are designed to be invariant (or robust) to minor transformations of the input, such as translations, rotations, a...

Image retrieval and classification using local distance functions

by Andrea Frome, Yoram Singer, Jitendra Malik - Advances in Neural Information Processing Systems , 2006
"... In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distanc ..."
Abstract - Cited by 107 (3 self) - Add to MetaCart
In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3 % mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
(Show Context)

Citation Context

...t compared to small geometric blur features. In our experiments we have not made use of geometric relationships between features, but this could be incorporated in a manner similar to that in [11] or =-=[16]-=-. 4 Image Browsing, Retrieval, and Classification The learned distance functions induce rankings that could naturally be the basis for a browsing application over a closed set of images. Consider a ra...

Hough Forests for Object Detection, Tracking, and Action Recognition

by Juergen Gall , Angela Yao, Nima Razavi, Luc Van Gool, Victor Lempitsky
"... The paper introduces Hough forests which are random forests adapted to perform a generalized Hough transform in an efficient way. Compared to previous Hough-based systems such as implicit shape models, Hough forests improve the performance of the generalized Hough transform for object detection on a ..."
Abstract - Cited by 97 (23 self) - Add to MetaCart
The paper introduces Hough forests which are random forests adapted to perform a generalized Hough transform in an efficient way. Compared to previous Hough-based systems such as implicit shape models, Hough forests improve the performance of the generalized Hough transform for object detection on a categorical level. At the same time, their flexibility permits extensions of the Hough transform to new domains such as object tracking and action recognition. Hough forests can be regarded as task-adapted codebooks of local appearance that allow fast supervised training and fast matching at test time. They achieve high detection accuracy since the entries of such codebooks are optimized to cast Hough votes with small variance, and since their efficiency permits dense sampling of local image patches or video cuboids during detection. The efficacy of Hough forests for a set of computer vision tasks is validated through experiments on a large set of publicly available benchmark datasets and comparisons with the state-of-the-art.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University