• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Harvesting mid-level visual concepts from large-scale internet images. In: CVPR (2013)

by Q Li, J Wu, Z Tu
Add To MetaCart

Tools

Sorted by:
Results 11 - 19 of 19

unknown title

by Gong Cheng, Junwei Han, Lei Guo, Zhenbao Liu, Shuhui Bu, Jinchang Ren
"... Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images ..."
Abstract - Add to MetaCart
Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images

Learning Important Spatial Pooling Regions for Scene Classification

by Di Lin, Cewu Lu, Renjie Liao, Jiaya Jia
"... We address the false response influence problem when learning and applying discriminative parts to construct the mid-level representation in scene classification. It is often caused by the complexity of latent image structure when convolving part filters with input images. This problem makes mid-lev ..."
Abstract - Add to MetaCart
We address the false response influence problem when learning and applying discriminative parts to construct the mid-level representation in scene classification. It is often caused by the complexity of latent image structure when convolving part filters with input images. This problem makes mid-level representation, even after pooling, not distinct enough to classify input data correctly to cate-gories. Our solution is to learn important spatial pooling regions along with their appearance. The experiments show that this new framework suppresses false response and produces improved results on several datasets, including MIT-Indoor, 15-Scene, and UIUC 8-Sport. When combined with global image features, our method achieves state-of-the-art performance on these datasets. 1.
(Show Context)

Citation Context

...of scene, exploiting them drew much attention recently. This type of methods can be understood in three ways. First, distinct power of learned parts is used to alleviate visual ambiguity. Recent work =-=[8, 24, 15, 16, 27, 17]-=- discovered parts with specific visual concepts – that is, the learned part is expected to represent a cluster of visual objects. Second, unsupervised discovery of discriminative parts is dominating. ...

Harvesting Motion Patterns in Still Images from the Internet

by Jiajun Wu, Yining Wang, Zhulin Li, Zhuowen Tu
"... Most vision research on motion analysis focuses on learning human actions from video clips. In this paper, we investigate the use of still images, rather than videos, for motion recog-nition. We present evidence from both human cognition and computer vision that still images do indeed contain a weal ..."
Abstract - Add to MetaCart
Most vision research on motion analysis focuses on learning human actions from video clips. In this paper, we investigate the use of still images, rather than videos, for motion recog-nition. We present evidence from both human cognition and computer vision that still images do indeed contain a wealth of information about motion patterns. Our contributions are three-fold. First, we automatically determine classes of mo-tions that can effectively be characterized by still images. To make this determination we introduce the notions of motion verbs (M-verbs) and motion phrases (M-phrases); these refer to linguistic concepts motivated by visual cognition and are not restricted only to motions performed by humans. Second, we build UCSD-1024, a large dataset distilled from more than two million still images. These images come from 1,024 categories of motion; we use crowdsourcing to provide human validation of the motion categories. Third, we exploit motion patterns from UCSD-1024 using a weakly-supervised learning strategy and demonstrate performance competitive with state-of-the-art computer vision action classification methods.
(Show Context)

Citation Context

...s dictionary, in concert with the Google and Bing image search engines, to build UCSD-1024. Learning mid-level representations is a popular topic in computer vision (Lim, Zitnick, & Dollár, 2013; Q. =-=Li, Wu, & Tu, 2013-=-). Here we learn a dictionary for motions using a hierarchical model based on mid-level representations on an eighty motion subset of UCSD-1024 that we henceforth refer to as UCSD-80. We then perform ...

FAME: Face Association through Model Evolution

by Eren Golge, Pinar Duygulu
"... We attack the problem of learning face models for pub-lic faces from weakly-labelled images collected from web through querying a name. The data is very noisy even after face detection, with several irrelevant faces corresponding to other people. We propose a novel method, Face Associ-ation through ..."
Abstract - Add to MetaCart
We attack the problem of learning face models for pub-lic faces from weakly-labelled images collected from web through querying a name. The data is very noisy even after face detection, with several irrelevant faces corresponding to other people. We propose a novel method, Face Associ-ation through Model Evolution (FAME), that is able to prune the data in an iterative way, for the face models asso-ciated to a name to evolve. The idea is based on capturing discriminativeness and representativeness of each instance and eliminating the outliers. The final models are used to classify faces on novel datasets with possibly different char-acteristics. On benchmark datasets, our results are compa-rable to or better than state-of-the-art studies for the task of face identification. 1.
(Show Context)

Citation Context

...on sense knowledge using web search results. Discovering representative and discriminative instances: Our method is also related to the recently emerged studies in discovering discriminative patches. =-=[25, 22, 38, 11, 10, 20, 12, 21]-=-. In [38], discriminative patches in images are discovered through an iterative method which alternates between clustering and training discriminative classifiers. Li et al. [25] solves same problem w...

CNN Features off-the-shelf: an Astounding Baseline for Recognition

by Ali Sharif, Razavian Hossein, Azizpour Josephine Sullivan, Stefan Carlsson
"... Recent results indicate that the generic descriptors ex-tracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of exper-iments conducted for different recognition tasks using the publicly availabl ..."
Abstract - Add to MetaCart
Recent results indicate that the generic descriptors ex-tracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of exper-iments conducted for different recognition tasks using the publicly available code and model of the OverFeat net-work which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the di-verse range of recognition tasks of object image classifica-tion, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they grad-ually move further away from the original task and data the OverFeat network was trained to solve. Remarkably we report better or competitive results compared to the state-of-the-art in all the tasks on various datasets. The results are achieved using a linear SVM classifier applied to a fea-ture representation of size 4096 extracted from a layer in the net. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual classification tasks. 1.
(Show Context)

Citation Context

...on the 67 MIT classes. It has a strong diagonal. The few relatively bright off-diagonal Method mean Accuracy ROI + Gist[34] 26.05 DPM[28] 30.40 Object Bank[23] 37.60 RBow[29] 37.93 BoP[20] 46.10 miSVM=-=[24]-=- 46.40 D-Parts[38] 51.40 IFV[20] 60.77 MLrep[10] 64.03 CNN-SVM 58.44 Table 2: MIT-67 indoor scenes dataset. It should be noted the MLrep [10] takes weeks to train various part classifiers and the IFV ...

ConceptLearner: Discovering Visual Concepts from Weakly Labeled Image Collections

by Bolei Zhou, Vignesh Jagadeesh, Robinson Piramuthu
"... Discovering visual knowledge from weakly labeled data is crucial to scale up computer vision recognition sys-tem, since it is expensive to obtain fully labeled data for a large number of concept categories. In this paper, we propose ConceptLearner, which is a scalable approach to discover visual con ..."
Abstract - Add to MetaCart
Discovering visual knowledge from weakly labeled data is crucial to scale up computer vision recognition sys-tem, since it is expensive to obtain fully labeled data for a large number of concept categories. In this paper, we propose ConceptLearner, which is a scalable approach to discover visual concepts from weakly labeled image collec-tions. Thousands of visual concept detectors are learned automatically, without human in the loop for additional an-notation. We show that these learned detectors could be applied to recognize concepts at image-level and to detect concepts at image region-level accurately. Under domain-specific supervision, we further evaluate the learned con-cepts for scene recognition on SUN database and for ob-ject detection on Pascal VOC 2007. ConceptLearner shows promising performance compared to fully supervised and weakly supervised methods. 1.
(Show Context)

Citation Context

...d labels instances of the given visual categories; LEVAN [10] harvests keywords from Google Ngram and uses them as structured queries to retrieve all the relevant diverse instances about one concept; =-=[22]-=- proposes a multiple instance learning algorithm to learn mid-level visual concepts from image query results. There are alternative approaches of discovering visual patterns from weakly labeled data t...

Rectifying Self Organizing Maps for Automatic Concept Learning from Web Images

by Eren Golge, Pinar Duygulu
"... We attack the problem of learning concepts automati-cally from noisy web image search results. Going beyond low level attributes, such as colour and texture, we explore weakly-labelled datasets for the learning of higher level concepts, such as scene categories. The idea is based on discovering comm ..."
Abstract - Add to MetaCart
We attack the problem of learning concepts automati-cally from noisy web image search results. Going beyond low level attributes, such as colour and texture, we explore weakly-labelled datasets for the learning of higher level concepts, such as scene categories. The idea is based on discovering common characteristics shared among subsets of images by posing a method that is able to organise the data while eliminating irrelevant instances. We propose a novel clustering and outlier detection method, namely Rec-tifying Self Organizing Maps (RSOM). Given an image col-lection returned for a concept query, RSOM provides clus-ters pruned from outliers. Each cluster is used to train a model representing a different characteristics of the con-cept. The proposed method outperforms the state-of-the-art studies on the task of learning low-level concepts, and it is competitive in learning higher level concepts as well. It is capable to work at large scale with no supervision through exploiting the available sources. 1.

Scalable Similarity Learning using Large Margin Neighborhood Embedding

by unknown authors
"... ar ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ion As the number of digital images generated and uploaded to the Internet skyrockets, automatic categorization of large-scale image sets with diversified contents has become a popular research topic =-=[1, 2, 3]-=-. The conventional approach to train a classifier for each class using one-versus-all paradigm is usually unscalable to such a large number of images and classes, not to mention that the sizes of most...

FOR REVIEW: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Joint Cosegmentation and Cosketch by Unsupervised Learning

by Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-chun Zhu
"... Abstract—Cosegmentation refers to the problem of segmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align the common objects in these images. To address this issue, we propose a ..."
Abstract - Add to MetaCart
Abstract—Cosegmentation refers to the problem of segmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align the common objects in these images. To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch. ” The goal of cosketch is to automatically discover a codebook of sketch templates shared by the input images. The sketch template is of hierarchical compositional structure where a large structured representational unit is composed of deformable smaller units. These sketch templates capture distinct image patterns and each template is matched to similar image patches in different images. The cosketches align foreground objects, thereby providing crucial information for cosegmentation. We present a statistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. In experiments, we apply the proposed method to some public benchmarks on cosegmentation, including MSRC, iCoseg and ImageNet. We also test our method on a new dataset called Coseg-Rep where cosegmentation can be performed on a single image with repetitive patterns.
(Show Context)

Citation Context

...d active basis. Active basis models are learned from images that are roughly aligned. This work is also related to [27]–[33], where repeated patterns are learned from unaligned input images. In [30]– =-=[32]-=-, a set of HOG or active basis templates are learned from multiple input images of the same object category. In [33], recurring tuples of visual words are extracted from single image with repetitive p...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University