Results 1 - 10
of
102
Segmentation as Selective Search for Object Recognition
"... For object recognition, the current state-of-the-art is based on exhaustive search. However, to enable the use of more expensive features and classifiers and thereby progress beyond the state-of-the-art, a selective search strategy is needed. Therefore, we adapt segmentation as a selective search by ..."
Abstract
-
Cited by 165 (7 self)
- Add to MetaCart
For object recognition, the current state-of-the-art is based on exhaustive search. However, to enable the use of more expensive features and classifiers and thereby progress beyond the state-of-the-art, a selective search strategy is needed. Therefore, we adapt segmentation as a selective search by reconsidering segmentation: We propose to generate many approximate locations over few and precise object delineations because (1) an object whose location is never generated can not be recognised and (2) appearance and immediate nearby context are most effective for object recognition. Our method is class-independent and is shown to cover 96.7 % of all objects in the Pascal VOC 2007 test set using only 1,536 locations per image. Our selective search enables the use of the more expensive bag-of-words method which we use to substantially improve the state-of-the-art by up to 8.5 % for 8 out of 20 classes on the Pascal VOC 2010 detection challenge.
Large-scale image classification: fast feature extraction and svm training
- In IEEE Conference on Computer Vision and Pattern Recognition
, 2011
"... Most research efforts on image classification so far have been focused on medium-scale datasets, which are often defined as datasets that can fit into the memory of a desktop (typically 4G∼48G). There are two main reasons for the limited effort on large-scale image classification. First, until the e ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
(Show Context)
Most research efforts on image classification so far have been focused on medium-scale datasets, which are often defined as datasets that can fit into the memory of a desktop (typically 4G∼48G). There are two main reasons for the limited effort on large-scale image classification. First, until the emergence of ImageNet dataset, there was almost no publicly available large-scale benchmark data for image classification. This is mostly because class labels are expensive to obtain. Second, large-scale classification is hard because it poses more challenges than its medium-scale counterparts. A key challenge is how to achieve efficiency in both feature extraction and classifier training without compromising performance. This paper is to show how we address this challenge using ImageNet dataset as an example. For feature extraction, we develop a Hadoop scheme that performs feature extraction in parallel using hundreds of mappers. This allows us to extract fairly sophisticated features (with dimensions being hundreds of thousands) on 1.2 million images within one day. For SVM training, we develop a parallel averaging stochastic gradient descent (ASGD) algorithm for training one-against-all 1000-class SVM classifiers. The ASGD algorithm is capable of dealing with terabytes of training data and converges very fast – typically 5 epochs are sufficient. As a result, we achieve state-of-the-art performance on the ImageNet 1000-class classification, i.e., 52.9 % in classification accuracy and 71.8 % in top 5 hit rate. 1.
Ask the locals: multi-way local pooling for image recognition
- in ICCV’11. IEEE
, 2011
"... ..."
(Show Context)
Good Practice in Large-Scale Learning for Image Classification
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
, 2013
"... We benchmark several SVM objective functions for large-scale image classification. We consider one-vs-rest, multi-class, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods i ..."
Abstract
-
Cited by 53 (6 self)
- Add to MetaCart
(Show Context)
We benchmark several SVM objective functions for large-scale image classification. We consider one-vs-rest, multi-class, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our experimental evaluation shows that ranking-based algorithms do not outperform the one-vs-rest strategy when a large number of training examples are used. Furthermore, the gap in accuracy between the different algorithms shrinks as the dimension of the features increases. We also show that learning through cross-validation the optimal rebalancing of positive and negative examples can result in a significant improvement for the one-vs-rest strategy. Finally, early stopping can be used as an effective regularization strategy when training with online algorithms. Following these “good practices”, we were able to improve the state-of-the-art on a large subset of 10K classes and 9M images of ImageNet from 16.7 % Top-1 accuracy to 19.1%.
Contextualizing object detection and classification
- IN CVPR
, 2011
"... In this paper, we investigate how to iteratively and mu-tually boost object classification and detection by taking the outputs from one task as the context of the other one. First, instead of intuitive feature and context concatenation or postprocessing with context, the so-called Contextualized Sup ..."
Abstract
-
Cited by 49 (6 self)
- Add to MetaCart
(Show Context)
In this paper, we investigate how to iteratively and mu-tually boost object classification and detection by taking the outputs from one task as the context of the other one. First, instead of intuitive feature and context concatenation or postprocessing with context, the so-called Contextualized Support Vector Machine (Context-SVM) is proposed, where the context takes the responsibility of dynamically adjusting the classification hyperplane, and thus the context-adaptive classifier is achieved. Then, an iterative training procedure is presented. In each step, Context-SVM, associated with the output context from one task (object classification or detection), is instantiated to boost the performance for the other task, whose augmented outputs are then further used to improve the former task by Context-SVM. The proposed solution is evaluated on the object classification and detec-tion tasks of PASCAL Visual Object Challenge (VOC) 2007 and 2010, and achieves the state-of-the-art performance.
Fast and balanced: Efficient label tree learning for large scale object recognition
- In NIPS
, 2011
"... We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fin ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
(Show Context)
We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1
A codebook-free and annotation-free approach for fine-grained image categorization
- In CVPR
, 2012
"... Fine-grained categorization refers to the task of classify-ing objects that belong to the same basic-level class (e.g. different bird species) and share similar shape or visual appearances. Most of the state-of-the-art basic-level ob-ject classification algorithms have difficulties in this chal-leng ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
(Show Context)
Fine-grained categorization refers to the task of classify-ing objects that belong to the same basic-level class (e.g. different bird species) and share similar shape or visual appearances. Most of the state-of-the-art basic-level ob-ject classification algorithms have difficulties in this chal-lenging problem. One reason for this can be attributed to the popular codebook-based image representation, of-ten resulting in loss of subtle image information that are critical for fine-grained classification. Another way to ad-dress this problem is to introduce human annotations of ob-ject attributes or key points, a tedious process that is al-so difficult to generalize to new tasks. In this work, we propose a codebook-free and annotation-free approach for fine-grained image categorization. Instead of using vector-quantized codewords, we obtain an image representation by running a high throughput template matching process us-ing a large number of randomly generated image templates. We then propose a novel bagging-based algorithm to build a final classifier by aggregating a set of discriminative yet largely uncorrelated classifiers. Experimental results show that our method outperforms state-of-the-art classification approaches on the Caltech-UCSD Birds dataset.
Dog Breed Classification Using Part Localization
"... Abstract. We propose a novel approach to fine-grained image classification in which instances from different classes share common parts but have wide variation in shape and appearance. We use dog breed identification as a test case to show that extracting corresponding parts improves classification ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
(Show Context)
Abstract. We propose a novel approach to fine-grained image classification in which instances from different classes share common parts but have wide variation in shape and appearance. We use dog breed identification as a test case to show that extracting corresponding parts improves classification performance. This domain is especially challenging since the appearance of corresponding parts can vary dramatically, e.g., the faces of bulldogs and beagles are very different. To find accurate correspondences, we build exemplar-based geometric and appearance models of dog breeds and their face parts. Part correspondence allows us to extract and compare descriptors in like image locations. Our approach also features a hierarchy of parts (e.g., face and eyes) and breed-specific part localization. We achieve 67 % recognition rate on a large real-world dataset including 133 dog breeds and 8,351 images, and experimental results show that accurate part localization significantly increases classification performance compared to state-of-the-art approaches. 1
Large-scale image classification with trace-norm regularization
- IEEE Conference on Computer Vision & Pattern Recognition (CVPR
, 2012
"... With the advent of larger image classification datasets such as ImageNet, designing scalable and efficient multi-class classification algorithms is now an important chal-lenge. We introduce a new scalable learning algorithm for large-scale multi-class image classification, based on the multinomial l ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
(Show Context)
With the advent of larger image classification datasets such as ImageNet, designing scalable and efficient multi-class classification algorithms is now an important chal-lenge. We introduce a new scalable learning algorithm for large-scale multi-class image classification, based on the multinomial logistic loss and the trace-norm regularization penalty. Reframing the challenging non-smooth optimiza-tion problem into a surrogate infinite-dimensional optimiza-tion problem with a regular `1-regularization penalty, we propose a simple and provably efficient accelerated coor-dinate descent algorithm. Furthermore, we show how to perform efficient matrix computations in the compressed domain for quantized dense visual features, scaling up to 100,000s examples, 1,000s-dimensional features, and 100s of categories. Promising experimental results on the “Fun-gus”, “Ungulate”, and “Vehicles ” subsets of ImageNet are presented, where we show that our approach performs sig-nificantly better than state-of-the-art approaches for Fisher vectors with 16 Gaussians. 1.