• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Hierarchical semantic indexing for large scale image retrieval. (2011)

by J Deng, A Berg, F-F Li
Venue:In CVPR,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 44
Next 10 →

A survey on metric learning for feature vectors and structured data

by A. Bellet, Amaury Habrard, Marc Sebban , 2014
"... ..."
Abstract - Cited by 35 (2 self) - Add to MetaCart
Abstract not found

IntentSearch:Capturing User Intention for One-Click Internet Image Search

by Xiaoou Tang, Ke Liu, Jingyu Cui, Student Member, Fang Wen, Xiaogang Wang , 2012
"... Abstract—Web-scale image search engines (e.g. Google Image Search, Bing Image Search) mostly rely on surrounding text features. It is difficult for them to interpret users ’ search intention only by query keywords and this leads to ambiguous and noisy search results which are far from satisfactory. ..."
Abstract - Cited by 22 (7 self) - Add to MetaCart
Abstract—Web-scale image search engines (e.g. Google Image Search, Bing Image Search) mostly rely on surrounding text features. It is difficult for them to interpret users ’ search intention only by query keywords and this leads to ambiguous and noisy search results which are far from satisfactory. It is important to use visual information in order to solve the ambiguity in text-based image retrieval. In this paper, we propose a novel Internet image search approach. It only requires the user to click on one query image with the minimum effort and images from a pool retrieved by text-based search are re-ranked based on both visual and textual content. Our key contribution is to capture the users ’ search intention from this one-click query image in four steps. (1) The query image is categorized into one of the predefined adaptive weight categories, which reflect users ’ search intention at a coarse level. Inside each category, a specific weight schema is used to combine visual features adaptive to this kind of images to better re-rank the text-based search result. (2) Based on the visual content of the query image selected by the user and through image clustering, query keywords are expanded to capture user intention. (3) Expanded keywords are used to enlarge the image pool to contain more relevant images. (4) Expanded keywords are also used to expand the query image to multiple positive visual examples from which new query specific visual and textual similarity metrics are learned to further improve content-based image re-ranking. All these steps are automatic without extra effort from the user. This is critically important for any commercial web-based image search engine, where the user interface has to be extremely simple. Besides this key contribution, a set of visual features which are both effective and efficient in Internet image search are designed. Experimental evaluation shows that our approach significantly improves the precision of top ranked images and also the user experience.
(Show Context)

Citation Context

... visual similarities which well reflect the semantic relevance of images. Image similarities can be learned from a large training set where the relevance of pairs of images is known [18]. Deng et al. =-=[19]-=- learned visual similarities from a hierarchical structure defined on semantic attributes of training images. Since web images are highly diversified, defining a set of attributes with hierarchical re...

Street-to-Shop: Cross-Scenario Clothing Retrieval via Parts Alignment and Auxiliary Set

by Si Liu, et al. , 2012
"... In this paper, we address a practical problem of cross-scenario clothing retrieval- given a daily human photo cap-tured in general environment, e.g., on street, finding simi-lar clothing in online shops, where the photos are captured more professionally and with clean background. There are large dis ..."
Abstract - Cited by 21 (6 self) - Add to MetaCart
In this paper, we address a practical problem of cross-scenario clothing retrieval- given a daily human photo cap-tured in general environment, e.g., on street, finding simi-lar clothing in online shops, where the photos are captured more professionally and with clean background. There are large discrepancies between daily photo scenario and on-line shopping scenario. We first propose to alleviate the human pose discrepancy by locating 30 human parts detected by a well trained hu-man detector. Then, founded on part features, we propose a two-step calculation to obtain more reliable one-to-many similarities between the query daily photo and online shop-ping photos: 1) the within-scenario one-to-many similari-ties between a query daily photo and the auxiliary set are derived by direct sparse reconstruction; and 2) by a cross-scenario many-to-many similarity transfer matrix inferred offline from an extra auxiliary set and the online shopping set, the reliable cross-scenario one-to-many similarities be-tween the query daily photo and all online shopping photos are obtained. We collect a large online shopping dataset and a daily photo dataset, both of which are thoroughly labeled with 15 clothing attributes via Mechanic Turk. The extensive exper-imental evaluations on the collected datasets well demon-strate the effectiveness of the proposed framework for cross-scenario clothing retrieval.

Learning to Place New Objects in a Scene

by Yun Jiang, Marcus Lim, Changxi Zheng, Ashutosh Saxena
"... Abstract—Placing is a necessary skill for a personal robot to have in order to perform tasks such as arranging objects in a disorganized room. The object placements should not only be stable but also be in their semantically preferred placing areas and orientations. This is challenging because an en ..."
Abstract - Cited by 20 (10 self) - Add to MetaCart
Abstract—Placing is a necessary skill for a personal robot to have in order to perform tasks such as arranging objects in a disorganized room. The object placements should not only be stable but also be in their semantically preferred placing areas and orientations. This is challenging because an environment can have a large variety of objects and placing areas that may not have been seen by the robot before. In this paper, we propose a learning approach for placing multiple objects in different placing areas in a scene. Given point-clouds of the objects and the scene, we design appropriate features and use a graphical model to encode various properties, such as the stacking of objects, stability, object-area relationship and common placing constraints. The inference in our model is an integer linear program, which we solve efficiently via an LP relaxation. We extensively evaluate our approach on 98 objects from 16 categories being placed into 40 areas. Our robotic experiments show a success rate of 98 % in placing known objects and 82 % in placing new objects stably. We use our method on our robots for performing tasks such as loading several dish-racks, a bookshelf and a fridge with multiple items. 1 I.

Meta-class features for largescale object categorization on a budget. In CVPR,

by Alessandro Bergamo , Lorenzo Torresani , 2012
"... Abstract In this paper we introduce a novel image descriptor enabling accurate object categorization even with linear models. Akin to the popular attribute descriptors, our feature vector comprises the outputs of a set of classifiers evaluated on the image. However, unlike traditional attributes wh ..."
Abstract - Cited by 15 (1 self) - Add to MetaCart
Abstract In this paper we introduce a novel image descriptor enabling accurate object categorization even with linear models. Akin to the popular attribute descriptors, our feature vector comprises the outputs of a set of classifiers evaluated on the image. However, unlike traditional attributes which represent hand-selected object classes and predefined visual properties, our features are learned automatically and correspond to "abstract" categories, which we name metaclasses. Each meta-class is a super-category obtained by grouping a set of object classes such that, collectively, they are easy to distinguish from other sets of categories. By using "learnability" of the meta-classes as criterion for feature generation, we obtain a set of attributes that encode general visual properties shared by multiple object classes and that are effective in describing and recognizing even novel categories, i.e., classes not present in the training set. We demonstrate that simple linear SVMs trained on our meta-class descriptor significantly outperform the best known classifier on the Caltech256 benchmark. We also present results on the 2010 ImageNet Challenge database where our system produces results approaching those of the best systems, but at a much lower computational cost.
(Show Context)

Citation Context

...ctors are often stored in compressed form and they are decompressed on the fly “one at a time” during training and testing [27, 15]. An exception is the work of Lin et al. [20] where the high storage and I/O costs caused by their high-dimensional descriptor were absorbed by a large system infrastructure consisting of Apache Hadoop to distribute computation and storage over many machines. Finally, the third strand of related work involves the use of image descriptors encoding categorical information as features: the image is represented in terms of its relation to a set of basis object classes [33, 29, 7] or as the response map to a set of detectors [19]. Even linear models applied to these high-level representations have been shown to produce good categorization accuracy. These descriptors can be viewed as generalizing attributes [16, 11, 17], which are semantic characteristics selected by humans as associated to the classes to recognize. Our approach is most closely related to this third line of work, as we also represent images in terms of the outputs of classifiers learned for a set of basis classes. However, we demonstrate that better accuracy can be achieved by learning the basis classes...

Learning Hierarchical Similarity Metrics

by Nakul Verma, Dhruv Mahajan, Sundararajan Sellamanickam, Vinod Nair
"... Categories in multi-class data are often part of an underlying semantic taxonomy. Recent work in object classification has found interesting ways to use this taxonomy structure to develop better recognition algorithms. Here we propose a novel framework to learn similarity metrics using the class tax ..."
Abstract - Cited by 14 (0 self) - Add to MetaCart
Categories in multi-class data are often part of an underlying semantic taxonomy. Recent work in object classification has found interesting ways to use this taxonomy structure to develop better recognition algorithms. Here we propose a novel framework to learn similarity metrics using the class taxonomy. We show that a nearest neighbor classifier using the learned metrics gets improved performance over the best discriminative methods. Moreover, by incorporating the taxonomy, our learned metrics can also help in some taxonomy specific applications. We show that the metrics can help determine the correct placement of a new category that was not part of the original taxonomy, and can provide effective classification amongst categories local to specific subtrees of the taxonomy. 1.
(Show Context)

Citation Context

.... [14], for instance, learn a taxonomy from images and the associated tag information to perform classification, annotation, and automatic hierarchical organization of a photo collection. Deng et al. =-=[5]-=- use a taxonomy to learn an image similarity function for improved retrieval. Salakhutdinov et al. [19] show that learning in a hierarchy can improve recognition performance on classes with a small nu...

Semantic-aware Co-indexing for Image Retrieval

by Shiliang Zhang, Yuanqing Lin, Qitian Nec Laboratories America
"... Inverted indexes in image retrieval not only allow fast access to database images but also summarize all knowledge about the database, so that their discriminative capacity largely determines the retrieval performance. In this paper, for vocabulary tree based image retrieval, we propose a semantic-a ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Inverted indexes in image retrieval not only allow fast access to database images but also summarize all knowledge about the database, so that their discriminative capacity largely determines the retrieval performance. In this paper, for vocabulary tree based image retrieval, we propose a semantic-aware co-indexing algorithm to jointly embed two strong cues into the inverted indexes: 1) local invariant features that are robust to delineate low-level image contents, and 2) semantic attributes from large-scale object recognition that may reveal image semantic meanings. For an initial set of inverted indexes of local features, we utilize 1000 semantic attributes to filter out isolated images and insert semantically similar images to the initial set. Encoding these two distinct cues together effectively enhances the discriminative capability of inverted indexes. Such co-indexing operations are totally off-line and introduce small computation overhead to online query cause only local features but no semantic attributes are used for query. Experiments and comparisons with recent retrieval methods on 3 datasets, i.e., UKbench, Holidays, Oxford5K, and 1.3 million images from Flickr as distractors, manifest the competitive performance of our method 1. 1.
(Show Context)

Citation Context

...[13]; finding similar images [20] by comparing hashing codes [21] of global features like GIST [15]; or retrieving objects of the same category by classifying images to multiple classes or attributes =-=[5, 22, 2, 25]-=-. This raises a natural question that how one retrieval method might take into account of multiple criteria in finding the candidates, e.g., returning near-duplicates to a query if presented in a data...

Multi-Class Open Set Recognition Using Probability of Inclusion

by Lalit P. Jain, Walter J. Scheirer, Terrance E. Boult
"... Abstract. The perceived success of recent visual recognition approaches has largely been derived from their performance on classification tasks, where all possible classes are known at training time. But what about open set problems, where unknown classes appear at test time? Intuitively, if we coul ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
Abstract. The perceived success of recent visual recognition approaches has largely been derived from their performance on classification tasks, where all possible classes are known at training time. But what about open set problems, where unknown classes appear at test time? Intuitively, if we could accurately model just the positive data for any known class without overfitting, we could reject the large set of unknown classes even under an assumption of incomplete class knowledge. In this paper, we formulate the problem as one of modeling positive training data at the decision boundary, where we can invoke the statistical extreme value theory. A new algorithm called the PI-SVM is introduced for estimating the unnormalized posterior probability of class inclusion. 1
(Show Context)

Citation Context

...s recognition problems in computer vision, posterior probabilities are widely used to make decisions in applications such as pedestrian classification and orientation estimation [18], image retrieval =-=[15]-=-, attribute fusion [42], part-based human tracking [47], large-scale multiclass object categorization [4], and activity recognition [39], among others. To operate in open set scenarios, a threshold fo...

1VQA: Visual Question Answering

by Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh
"... Abstract—We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring many real-world scenarios, such as helping the visually impaired, both the q ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract—We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring many real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing 100, 000’s of images and questions and discuss the information it provides. Numerous baselines for VQA are provided and compared with human performance. F
(Show Context)

Citation Context

...mages). Our questions are generated by humans, making the need for commonsense knowledge and complex reasoning more essential. Describing Visual Content. Related to VQA are the tasks of image tagging =-=[9]-=-, [24], image captioning [25], [15], [33], [7], [14], [42], [10], [20], [31], [22] and video captioning [38], [18], where words or sentences are generated to describe visual content. While these tasks...

Attribute-augmented semantic hierarchy: Towards bridging semantic gap and intention gap in image retrieval

by Hanwang Zhang, Zheng-jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, Tat-seng Chua , 2013
"... This paper presents a novel Attribute-augmented Semantic Hierarchy (A2SH) and demonstrates its effectiveness in bridging both the semantic and intention gaps in Content-based Image Retrieval (CBIR). A2SH organizes the semantic concepts into multiple semantic levels and augments each concept with a s ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
This paper presents a novel Attribute-augmented Semantic Hierarchy (A2SH) and demonstrates its effectiveness in bridging both the semantic and intention gaps in Content-based Image Retrieval (CBIR). A2SH organizes the semantic concepts into multiple semantic levels and augments each concept with a set of related attributes. The attributes are used to describe the multiple facets of the concept and act as the intermediate bridge connecting the concept and low-level visual content. A hierarchical semantic similarity function is learnt to characterize the semantic similarities among images for retrieval. To better capture user search intent, a hybrid feedback mechanism is developed, which collects hybrid feedbacks on attributes and images. These feedbacks are then used to refine the search results based on A2SH. We use A2SH as basis to develop a unified content-based image retrieval system. We conduct extensive experiments on a large-scale data set of over one million Web images. Experimental results show that the proposed A2SH can characterize the semantic affinities among images accurately and can shape user search intent precisely and quickly, leading to more accurate search results as compared to state-of-the-art CBIR solutions.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University