DMCA
Harvesting mid-level visual concepts from large-scale internet images (2013)
Venue: | In CVPR |
Citations: | 19 - 4 self |
Citations
8949 | Distinctive Image Features from ScaleInvariant Keypoints
- Lowe
(Show Context)
Citation Context ...icant improvement over the competing systems in image classification, including those with strong supervision. 1. Introduction The inventions of robust and informative low-level features such as SIFT =-=[18]-=-, HOG [4], and LBP [22] have been considered as one of the main advances/causes for the recent success in computer vision. Yet, one of the most fundamental issues in vision remains to be the problem o... |
3735 | Histograms of oriented gradients for human detection. CVPR
- Dalal, Triggs
- 2005
(Show Context)
Citation Context ...ovement over the competing systems in image classification, including those with strong supervision. 1. Introduction The inventions of robust and informative low-level features such as SIFT [18], HOG =-=[4]-=-, and LBP [22] have been considered as one of the main advances/causes for the recent success in computer vision. Yet, one of the most fundamental issues in vision remains to be the problem of “repres... |
2253 | WordNet: A lexical database for English
- Miller
- 1995
(Show Context)
Citation Context ...vel categories. In the following sections, we introduce the details of our scheme. 3.1. Word Selection and Image Collection The literal words are selected from ImageNet [5], which is based on WordNet =-=[19]-=- and Classeme [29]. For the words with similar meanings, e.g., “people”, “guest”, “worker”, and “judge”, we keep the most generic one. In all,... |
1920 | Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories - Lazebnik, Schmid, et al. |
1413 | Liblinear: A library for large linear classification. JMLR
- Fan, Chang, et al.
- 2008
(Show Context)
Citation Context ...ons), LBP [36] (of 256 dimensions) and the L∗a∗b∗ histogram (of 96 dimensions) as the feature; these features are concatenated, leading to a feature vector of dimension 2400. The toolbox of LIBLINEAR =-=[7]-=- is adopted for efficient training; for each word, five iterations are used in miSVM. To create the visual concepts, on the patches labeled as positive by miSVM, 20 clusters are found using K-means; T... |
839 | ImageNet: a largescale hierarchical image database
- Deng, Dong
(Show Context)
Citation Context ...few cancer patterns for medical image segmentation; in addition, the lack of explicit competition among clusters leads to poor results in our problem. In terms of large-scale natural images, ImageNet =-=[5]-=- is shown to be a great resource. Here, we find it convenient to directly crawl images from the search engines using word-based queries. 3. Automatic Visual Concept Learning Starting from a pool of wo... |
774 | Learning the kernel matrix with semidefinite programming
- Lanckriet, Cristianini, et al.
- 2004
(Show Context)
Citation Context ...effectiveness of the visual concepts we learned. Finally, the features corresponding to the single-concept classifiers and the multi-cluster visual concepts are combined like multiple kernel learning =-=[2, 13]-=-. The kernels ... |
693 |
A comparative study of texture measures with classification based on featured distributions,”
- Ojala, Pietikainen, et al.
- 1996
(Show Context)
Citation Context ...the competing systems in image classification, including those with strong supervision. 1. Introduction The inventions of robust and informative low-level features such as SIFT [18], HOG [4], and LBP =-=[22]-=- have been considered as one of the main advances/causes for the recent success in computer vision. Yet, one of the most fundamental issues in vision remains to be the problem of “representation”, whi... |
649 | A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results
- Everingham, Gool, et al.
- 2012
(Show Context)
Citation Context ...dation to determine the best ... |
526 | VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org
- Vedaldi, Fulkerson
- 2008
(Show Context)
Citation Context ...ch as the SMO algorithm in [2], we can simply use cross-validation to determine the best ... |
496 | Linear Spatial Pyramid Matching Using Sparse Coding for
- Yang, Yu, et al.
- 2009
(Show Context)
Citation Context ...cluster property with 14, 200 visual concepts and can model the diversity of the Internet images. We also test vector quanScene-15 UIUC-Sport MIT-Indoor Object Bank [17] 80.9% 76.3% 37.6% Yang et al. =-=[35]-=- 80.4% - - Li et al. [16] - 73.4% - Singh et al. [28] - - 38% Pandey et al. [23] - - 43.1% Quattoni et al. [27] - - 26% Niu et al. [21] 78% 82.5% - Wang et al. [33] 80.43% - 33.7% Kwitt et al. [12] 82... |
445 | Multiple kernel learning, conic duality
- Bach, Lanckriet, et al.
- 2004
(Show Context)
Citation Context ...effectiveness of the visual concepts we learned. Finally, the features corresponding to the single-concept classifiers and the multi-cluster visual concepts are combined like multiple kernel learning =-=[2, 13]-=-. The kernels ... |
442 | Localityconstrained linear coding for image classification.
- Wang, Yang, et al.
- 2010
(Show Context)
Citation Context ...uster visual concepts, the mAP is 57.5%, much higher than that of KMSALL and KMS-SUB. We also compare our visual concepts with the improved Fisher-kernel (FK), locality-constrained linear coding (LLC)=-=[32]-=-, and vector quantization (VQ). The fisher kernel starts from a Gaussian Mixture-Model (GMM), and concatenates the average first and second order differences between the patch descriptors and the cent... |
362 | Improving the fisher kernel for large-scale image classification.
- Perronnin, Sánchez, et al.
- 2010
(Show Context)
Citation Context ...n Mixture-Model (GMM), and concatenates the average first and second order differences between the patch descriptors and the centers of the GMM, leading to a feature vector of very high dimension. In =-=[26]-=-, the Fisher-kernel is improved by reducing the dimensionality of the patch descriptors using PCA. LLC [32] projects the patch descriptors to the local linear subspaces spanned by some visual words cl... |
347 | Describing objects by their attributes.
- Farhadi, Endres, et al.
- 2009
(Show Context)
Citation Context ... Beyond low-level features, obtaining effective mid-level representations has become increasingly important. For example, there have been many recent efforts made along the line of attribute learning =-=[8, 24, 17]-=-. These approaches, however, are mostly focused on supervised or active learning where a considerable amount of human efforts are required to provide detailed manual annotations. The limitations to th... |
314 | Support vector machines for multipleinstance learning.
- Andrews, Tsochantaridis, et al.
- 2003
(Show Context)
Citation Context ...ries, such as “bike”, “bird”, “tree”, allows us to crawl images of high relevance, good quality, and large diversity (at least for the top-ranked ones); (3) the multiple instance learning formulation =-=[1]-=- enables us to exploit common patterns from retrieved images, which have a high degree of relevance to the query words; (4) saliency detection [9] saliency detection helps to reduce the search space b... |
291 | Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Analysis and Machine Intelligence,
- Zhao, Pietikainen
- 2007
(Show Context)
Citation Context ...en used with the specific models learned from specific image sets, the results can be further improved to a large extent. 4.1. Implementations For each patch, we use HOG [4] (of 2048 dimensions), LBP =-=[36]-=- (of 256 dimensions) and the L∗a∗b∗ histogram (of 96 dimensions) as the feature; these features are concatenated, leading to a feature vector of dimension 2400. The toolbox of LIBLINEAR [7] is adopted... |
245 | Efficient additive kernels via explicit feature maps. PAMI,
- Vedaldi, Zisserman
- 2011
(Show Context)
Citation Context ...ch as the SMO algorithm in [2], we can simply use cross-validation to determine the best ... |
207 | Object Bank: A highlevel image representation for scene classification & semantic feature sparsification.
- Li, Su, et al.
- 2010
(Show Context)
Citation Context ... Beyond low-level features, obtaining effective mid-level representations has become increasingly important. For example, there have been many recent efforts made along the line of attribute learning =-=[8, 24, 17]-=-. These approaches, however, are mostly focused on supervised or active learning where a considerable amount of human efforts are required to provide detailed manual annotations. The limitations to th... |
195 |
The devil is in the details: an evaluation of recent feature encoding methods
- Chatfield, Lempitsky, et al.
- 2011
(Show Context)
Citation Context ... max-pooling the reconstruction weights. The improved Fisher-Kernel and LLC stand for the state-of-the-arts. For FK, LLC and VQ, the results reported here are from the image classification toolbox in =-=[3]-=-. In [3], multi-scale dense SIFT descriptors are used as the local features and the ... |
178 | Building high-level features using large scale unsupervised learning
- Le, Monga, et al.
(Show Context)
Citation Context ...VM and Kmeans in a novel way; in addition, our method significantly outperforms [28] with a relative 37% improvement on the MIT-Indoor scene dataset, on which both the approaches have been tested. In =-=[15]-=-, high-level features are built from large scale Internet images with nine layers of locally connected sparse autoencoder; however, their autoencoder approach is much more complex than the scheme prop... |
167 | Recognizing indoor scenes.
- Quattoni, Torralba
- 2009
(Show Context)
Citation Context ...ernel is used in the experiments, and it can be computed efficiently using the explicit feature map in [31, 30]. 4. Experiments and Results On the PASCAL VOC 2007 [6], scene-15 [14], MIT indoor scene =-=[27]-=-, UIUC-Sport [16] and Inria horse [10] image sets, we evaluate the visual concepts learned from the Internet images. On these image sets, the visual concepts achieve the state-of-the-art performances,... |
154 | What, where and who? Classifying events by scene and object recognition.
- Li, Fei-Fei
- 2007
(Show Context)
Citation Context ...the experiments, and it can be computed efficiently using the explicit feature map in [31, 30]. 4. Experiments and Results On the PASCAL VOC 2007 [6], scene-15 [14], MIT indoor scene [27], UIUC-Sport =-=[16]-=- and Inria horse [10] image sets, we evaluate the visual concepts learned from the Internet images. On these image sets, the visual concepts achieve the state-of-the-art performances, demonstrating it... |
151 | Relative attributes. In:
- Parikh, Grauman
- 2011
(Show Context)
Citation Context ... concepts on various benchmark datasets. 2. Related Works Visual attribute learning has recently attracted a lot of attention. However, many existing algorithms were designed as supervised approaches =-=[8, 24, 17, 25, 17]-=-, preventing them from scaling up to deal with a large number of images. A term, “classeme”, was introduced in [29] which also explores Internet images using word-based queries; however, only one clas... |
122 | Efficient object category recognition using classemes.
- Torresani, Szummer, et al.
- 2010
(Show Context)
Citation Context ...ever, many existing algorithms were designed as supervised approaches [8, 24, 17, 25, 17], preventing them from scaling up to deal with a large number of images. A term, “classeme”, was introduced in =-=[29]-=- which also explores Internet images using word-based queries; however, only one classeme is learned for each category and the objective of the classeme work is to learn image-level representations. I... |
94 |
Scene recognition and weakly supervised object localization with deformable part-based models.
- Pandey, Lazebnik
- 2011
(Show Context)
Citation Context ... Internet images. We also test vector quanScene-15 UIUC-Sport MIT-Indoor Object Bank [17] 80.9% 76.3% 37.6% Yang et al. [35] 80.4% - - Li et al. [16] - 73.4% - Singh et al. [28] - - 38% Pandey et al. =-=[23]-=- - - 43.1% Quattoni et al. [27] - - 26% Niu et al. [21] 78% 82.5% - Wang et al. [33] 80.43% - 33.7% Kwitt et al. [12] 82.3% 83.0% 44.0% KMS-ALL 78.7% 81.5% 38.8% KMS-SUB 80.4% 83.2% 41.9% VQ 82.1% 85.... |
82 | Randomized clustering forests for image classification.
- Moosmann, Nowak, et al.
- 2008
(Show Context)
Citation Context ...aining images for testing and run the experiments for 10 rounds. On this image set, the accuracy of our visual concepts is 92.47%, better than the accuracy 91.4% of VQ with 10, 000 codes and 85.3% in =-=[20]-=-. 5. Conclusion In this paper, we have introduced a scheme to automatically exploit mid-level representations, called visual concepts, from large-scale Internet images retrieved using word-based queri... |
79 | Unsupervised discovery of mid-level discriminative patches.
- Singh, Gupta, et al.
- 2012
(Show Context)
Citation Context ...epts for the purpose of performing general image understanding, which goes out of the scope of classeme [29] as it is computationally prohibitive for [29] to train on a large scale. A recent approach =-=[28]-=- learns “discriminative patches” in an unsupervised manner. However, [28] learns discriminative patches while we focus on dictionary learning for the mid-level representations; [28] uses an iterative ... |
75 | Accurate Object Detection with Deformable Shape Models Learnt from Images, In CVPR,
- Ferrari, Jurie, et al.
- 2007
(Show Context)
Citation Context ...it can be computed efficiently using the explicit feature map in [31, 30]. 4. Experiments and Results On the PASCAL VOC 2007 [6], scene-15 [14], MIT indoor scene [27], UIUC-Sport [16] and Inria horse =-=[10]-=- image sets, we evaluate the visual concepts learned from the Internet images. On these image sets, the visual concepts achieve the state-of-the-art performances, demonstrating its good cross-dataset ... |
69 | Interactively building a discriminative vocabulary of nameable attributes
- Parikh, Grauman
- 2011
(Show Context)
Citation Context ... Beyond low-level features, obtaining effective mid-level representations has become increasingly important. For example, there have been many recent efforts made along the line of attribute learning =-=[8, 24, 17]-=-. These approaches, however, are mostly focused on supervised or active learning where a considerable amount of human efforts are required to provide detailed manual annotations. The limitations to th... |
30 | Salient object detection by composition,”
- Feng, Wei, et al.
- 2011
(Show Context)
Citation Context ...ones); (3) the multiple instance learning formulation [1] enables us to exploit common patterns from retrieved images, which have a high degree of relevance to the query words; (4) saliency detection =-=[9]-=- saliency detection helps to reduce the search space by finding potential candidates. The main contributions of this paper thus include the following aspects: (1) we emphasize the importance of automa... |
29 |
Efros, “Unsupervised discovery of mid-level discriminative patches
- Singh, Gupta, et al.
- 2012
(Show Context)
Citation Context ...epts for the purpose of performing general image understanding, which goes out of the scope of classeme [29] as it is computationally prohibitive for [29] to train on a large scale. A recent approach =-=[28]-=- learns “discriminative patches” in an unsupervised manner. However, [28] learns discriminative patches while we focus on dictionary learning for the mid-level representations; [28] uses an iterative ... |
17 |
Context aware topic model for scene recognition.
- Niu, Hua, et al.
- 2012
(Show Context)
Citation Context ...-Sport MIT-Indoor Object Bank [17] 80.9% 76.3% 37.6% Yang et al. [35] 80.4% - - Li et al. [16] - 73.4% - Singh et al. [28] - - 38% Pandey et al. [23] - - 43.1% Quattoni et al. [27] - - 26% Niu et al. =-=[21]-=- 78% 82.5% - Wang et al. [33] 80.43% - 33.7% Kwitt et al. [12] 82.3% 83.0% 44.0% KMS-ALL 78.7% 81.5% 38.8% KMS-SUB 80.4% 83.2% 41.9% VQ 82.1% 85.6% 47.6% VC 83.4% 84.8% 46.4% VC+VQ 85.4% 88.4% 52.3% T... |
15 | J.-M.: Comparative evaluation of binary features.
- Heinly, Dunn, et al.
- 2012
(Show Context)
Citation Context ...tributes is often intrinsically ambiguous, (3) the number of attributes and training images are hard to scale. Some other methods in which detailed manual annotations are not required (e.g. classemes =-=[11]-=-) however are not designed to build a dictionary of mid-level representations. In this paper, we propose a scheme to build a path from words to visual concepts; using this scheme, effective midlevel r... |
14 | Multiple clustered instance learning for histopathology cancer image classification, segmentation and clustering. In:
- Xu, Zhu, et al.
- 2012
(Show Context)
Citation Context ...aper. In [37], saliency detection is utilized to create bags of image patches, but only one object is assumed in each image for the task of object discovery. Although multiple clusters are learned in =-=[34]-=-, its goal is to identify a few cancer patterns for medical image segmentation; in addition, the lack of explicit competition among clusters leads to poor results in our problem. In terms of large-sca... |
12 | Scene recognition on the semantic manifold.
- Vasconcelos, Rasiwasia
- 2012
(Show Context)
Citation Context ...l. [35] 80.4% - - Li et al. [16] - 73.4% - Singh et al. [28] - - 38% Pandey et al. [23] - - 43.1% Quattoni et al. [27] - - 26% Niu et al. [21] 78% 82.5% - Wang et al. [33] 80.43% - 33.7% Kwitt et al. =-=[12]-=- 82.3% 83.0% 44.0% KMS-ALL 78.7% 81.5% 38.8% KMS-SUB 80.4% 83.2% 41.9% VQ 82.1% 85.6% 47.6% VC 83.4% 84.8% 46.4% VC+VQ 85.4% 88.4% 52.3% Table 2. The classification accuracies on the scene datasets. t... |
11 | Unsupervised object class discovery via saliency-guided multiple class learning
- Zhu, Wu, et al.
(Show Context)
Citation Context ...re built from large scale Internet images with nine layers of locally connected sparse autoencoder; however, their autoencoder approach is much more complex than the scheme proposed in this paper. In =-=[37]-=-, saliency detection is utilized to create bags of image patches, but only one object is assumed in each image for the task of object discovery. Although multiple clusters are learned in [34], its goa... |
1 | Learning sparse covariance patterns for natural scenes
- Wang, Li, et al.
- 2012
(Show Context)
Citation Context ... [17] 80.9% 76.3% 37.6% Yang et al. [35] 80.4% - - Li et al. [16] - 73.4% - Singh et al. [28] - - 38% Pandey et al. [23] - - 43.1% Quattoni et al. [27] - - 26% Niu et al. [21] 78% 82.5% - Wang et al. =-=[33]-=- 80.43% - 33.7% Kwitt et al. [12] 82.3% 83.0% 44.0% KMS-ALL 78.7% 81.5% 38.8% KMS-SUB 80.4% 83.2% 41.9% VQ 82.1% 85.6% 47.6% VC 83.4% 84.8% 46.4% VC+VQ 85.4% 88.4% 52.3% Table 2. The classification ac... |