• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

M.: In defense of nearest-neighbor based image classification. In: (2008)

by O Boiman, E Shechtman, Irani
Venue:CVPR.
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 266
Next 10 →

Imagenet: A large-scale hierarchical image database

by Jia Deng, Wei Dong, Richard Socher, Li-jia Li, Kai Li, Li Fei-fei - In CVPR , 2009
"... The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce her ..."
Abstract - Cited by 840 (28 self) - Add to MetaCart
The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a largescale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond. 1.
(Show Context)

Citation Context

...e clean ImageNet dataset. This result shows that having more accurate data improves classification performance. 3. NBNN We also implement the Naive Bayesian Nearest Neighbor (NBNN) method proposed in =-=[5]-=- to underline the usefulness of full resolution images. NBNN employs a bag-of-features representation of images. SIFT [15] descriptors are used in this experiment. Given a query image Q with descripto...

Linear spatial pyramid matching using sparse coding for image classification

by Jianchao Yang, Kai Yu, Yihong Gong, Thomas Huang - in IEEE Conference on Computer Vision and Pattern Recognition(CVPR , 2009
"... Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n 2 ∼ n 3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algo ..."
Abstract - Cited by 497 (21 self) - Add to MetaCart
Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n 2 ∼ n 3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors. 1.
(Show Context)

Citation Context

...owed by experiment results in Sec. 5. Finally, Sec. 6 concludes our paper. 2. Related Work Over the years many works have been done to improve the traditional BoF model, such as generative methods in =-=[7, 21, 3, 1]-=- for modeling the co-occurrence of the codewords or descriptors, discriminative codebook learning in [10, 5, 19, 27] instead of standard unsupervised K-means clustering, and spatial pyramid matching k...

Locality-constrained linear coding for image classification

by Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang, Yihong Gong - IN: IEEE CONFERENCE ON COMPUTER VISION AND PATTERN CLASSIFICATOIN , 2010
"... The traditional SPM approach based on bag-of-features (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. LLC util ..."
Abstract - Cited by 443 (20 self) - Add to MetaCart
The traditional SPM approach based on bag-of-features (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. LLC utilizes the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation. With linear classifier, the proposed approach performs remarkably better than the traditional nonlinear SPM, achieving state-of-the-art performance on several benchmarks. Compared with the sparse coding strategy [22], the objective function used by LLC has an analytical solution. In addition, the paper proposes a fast approximated LLC method by first performing a K-nearest-neighbor search and then solving a constrained least square fitting problem, bearing computational complexity of O(M + K2). Hence even with very large codebooks, our system can still process multiple frames per second. This efficiency significantly adds to the practical values of LLC for real applications.
(Show Context)

Citation Context

...s the information about the spatial layout of features, hence it is incapable of capturing shapes or locating an object. Of the many extensions of the BoF method, including the generative part models =-=[7, 3, 2]-=-, geometric correspondence search [1, 14] and discriminative codebook learning [13, 17, 23], the most successful results were reported Feature vector [ ] Code Descriptor Image SPM Concatenating Poolin...

Improving the Fisher kernel for large-scale image classification.

by Florent Perronnin , Jorge Sánchez , Thomas Mensink - In ECCV, , 2010
"... Abstract. The Fisher kernel (FK) is a generic framework which combines the benefits of generative and discriminative approaches. In the context of image classification the FK was shown to extend the popular bag-of-visual-words (BOV) by going beyond count statistics. However, in practice, this enric ..."
Abstract - Cited by 362 (20 self) - Add to MetaCart
Abstract. The Fisher kernel (FK) is a generic framework which combines the benefits of generative and discriminative approaches. In the context of image classification the FK was shown to extend the popular bag-of-visual-words (BOV) by going beyond count statistics. However, in practice, this enriched representation has not yet shown its superiority over the BOV. In the first part we show that with several well-motivated modifications over the original framework we can boost the accuracy of the FK. On PASCAL VOC 2007 we increase the Average Precision (AP) from 47.9% to 58.3%. Similarly, we demonstrate state-of-the-art accuracy on CalTech 256. A major advantage is that these results are obtained using only SIFT descriptors and costless linear classifiers. Equipped with this representation, we can now explore image classification on a larger scale. In the second part, as an application, we compare two abundant resources of labeled images to learn classifiers: ImageNet and Flickr groups. In an evaluation involving hundreds of thousands of training images we show that classifiers learned on Flickr groups perform surprisingly well (although they were not intended for this purpose) and that they can complement classifiers learned on more carefully annotated datasets.
(Show Context)

Citation Context

...viously are inherently limited by the shortcomings of the BOV representation, and especially by the fact that the descriptor quantization is a lossy process as underlined in the work of Boiman et al. =-=[19]-=-. Hence, efficient alternatives to the BOV histogram have been sought. Bo and Sminchisescu [20] proposed the Efficient Match Kernel (EMK) which consists in mapping the local descriptors to a low-dimen...

Computer Vision: Algorithms and Applications

by Richard Szeliski , 2010
"... ..."
Abstract - Cited by 252 (2 self) - Add to MetaCart
Abstract not found

T.: Adapting visual category models to new domains. In: ECCV

by Kate Saenko, Brian Kulis, Mario Fritz, Trevor Darrell , 2010
"... Abstract. Domain adaptation is an important emerging topic in computer vision. In this paper, we present one of the first studies of domain shift in the context of object recognition. We introduce a method that adapts object models acquired in a particular visual domain to new imaging conditions by ..."
Abstract - Cited by 163 (20 self) - Add to MetaCart
Abstract. Domain adaptation is an important emerging topic in computer vision. In this paper, we present one of the first studies of domain shift in the context of object recognition. We introduce a method that adapts object models acquired in a particular visual domain to new imaging conditions by learning a transformation that minimizes the effect of domain-induced changes in the feature distribution. The transformation is learned in a supervised manner and can be applied to categories for which there are no labeled examples in the new domain. While we focus our evaluation on object recognition tasks, the transform-based adaptation technique we develop is general and could be applied to non-image data. Another contribution is a new multi-domain object database, freely available for download. We experimentally demonstrate the ability of our method to improve recognition on categories with few or no target domain labels and moderate to large changes in the imaging conditions. 1
(Show Context)

Citation Context

...ns. 1 Introduction Supervised classification methods, such as kernel-based and nearest-neighbor classifiers, have been shown to perform very well on standard object recognition tasks (e.g. [4], [17], =-=[3]-=-). However, many such methods expect the test images to come from the same distribution as the training images, and often fail when presented with a novel visual domain. While the problem of domain ad...

Visual Word Ambiguity

by J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, J. M. Geusebroek - ACCEPTED IN IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... This paper studies automatic image classification by modeling soft-assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inh ..."
Abstract - Cited by 140 (11 self) - Add to MetaCart
This paper studies automatic image classification by modeling soft-assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been applied successfully for some years. In this paper we investigate four types of soft-assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard-assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known datasets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories.

Efficient Object Category Recognition Using

by Lorenzo Torresani, Martin Szummer, Andrew Fitzgibbon
"... Abstract. We introduce a new descriptor for images which allows the construction of efficient and compact classifiers with good accuracy on object category recognition. The descriptor is the output of a large number of weakly trained object category classifiers on the image. The trained categories a ..."
Abstract - Cited by 122 (9 self) - Add to MetaCart
Abstract. We introduce a new descriptor for images which allows the construction of efficient and compact classifiers with good accuracy on object category recognition. The descriptor is the output of a large number of weakly trained object category classifiers on the image. The trained categories are selected from an ontology of visual concepts, but the intention is not to encode an explicit decomposition of the scene. Rather, we accept that existing object category classifiers often encode not the category per se but ancillary image characteristics; and that these ancillary characteristics can combine to represent visual classes unrelated to the constituent categories ’ semantic meanings. The advantage of this descriptor is that it allows object-category queries to be made against image databases using efficient classifiers (efficient at test time) such as linear support vector machines, and allows these queries to be for novel categories. Even when the representation is reduced to 200 bytes per image, classification accuracy on object category recognition is comparable with the state of the art (36 % versus 42%), but at orders of magnitude lower computational cost.
(Show Context)

Citation Context

...criptors used in the multiple-kernel technique. Note that although we shall later be specific about the number of bits per element of d, this is not required for the current discussion. Boiman et al. =-=[2]-=- shows one of the most intriguing results on the Caltech 256 benchmark: a nearest-neighbour-like classifier on low-level feature descriptors produces excellent performance, especially with small train...

What does classifying more than 10,000 image categories tell us?

by Jia Deng, Alexander C. Berg, Kai Li, Li Fei-Fei
"... Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This pape ..."
Abstract - Cited by 118 (11 self) - Add to MetaCart
Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10, 000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of WordNet (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.
(Show Context)

Citation Context

...assifiers using BoW features, around 10% lower in accuracy. This is consistent with the experience of the field – methods that do use kNN must be augmented in order to provide competitive performance =-=[2, 37]-=-. But the picture is different for ImageNet7K or ImageNet10K categories, where simple kNN actually outperforms linear SVMs on BoW features (BOW+SVM), with 11-16% higher accuracy. The small absolute ga...

Recognition using Regions

by Chunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik
"... This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objects naturally; (2) they are only mildly affected by background clutter. Regions have ..."
Abstract - Cited by 106 (5 self) - Add to MetaCart
This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objects naturally; (2) they are only mildly affected by background clutter. Regions have not been popular as features due to their sensitivity to segmentation errors. In this paper, we start by producing a robust bag of overlaid regions for each image using Arbeláez et al., CVPR 2009. Each region is represented by a rich set of image cues (shape, color and texture). We then learn region weights using a max-margin framework. In detection and segmentation, we apply a generalized Hough voting scheme to generate hypotheses of object locations, scales and support, followed by a verification classifier and a constrained segmenter on each hypothesis. The proposed approach significantly outperforms the state of the art on the ETHZ shape database (87.1 % average detection rate compared to Ferrari et al.’s 67.2%), and achieves competitive performance on the Caltech 101 database.
(Show Context)

Citation Context

...re 9. Figure 9. Mean recognition rate (%) over number of training images per category in Caltech 101. With 15 and 30 training images per category, our method outperforms [14, 15, 33, 13] and [19] but =-=[5]-=-. 6. Conclusion In this paper, we have presented a unified framework for object detection, segmentation, and classification using regions. Building on a novel region segmentation algorithm which produ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University