Results 1 - 10
of
151
SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes
"... In this paper we present the first large-scale scene attribute database. First, we perform crowd-sourced human studies to find a taxonomy of 102 discriminative attributes. Next, we build the “SUN attribute database ” on top of the diverse SUN categorical database. Our attribute database spans more t ..."
Abstract
-
Cited by 66 (6 self)
- Add to MetaCart
(Show Context)
In this paper we present the first large-scale scene attribute database. First, we perform crowd-sourced human studies to find a taxonomy of 102 discriminative attributes. Next, we build the “SUN attribute database ” on top of the diverse SUN categorical database. Our attribute database spans more than 700 categories and 14,000 images and has potential for use in high-level scene understanding and fine-grained scene recognition. We use our dataset to train attribute classifiers and evaluate how well these relatively simple classifiers can recognize a variety of attributes related to materials, surface properties, lighting, functions and affordances, and spatial envelope properties. 1.
A Joint Model of Language and Perception for Grounded Attribute Learning
- 2012. Cynthia Matuszek, Evan Herbst, Luke Zettlemoyer, Dieter Fox
"... As robots become more ubiquitous and capable, it becomes ever more important for untrained users to easily interact with them. Recently, this has led to study of the language grounding problem, where the goal is to extract representations of the meanings of natural language tied to the physical worl ..."
Abstract
-
Cited by 56 (14 self)
- Add to MetaCart
(Show Context)
As robots become more ubiquitous and capable, it becomes ever more important for untrained users to easily interact with them. Recently, this has led to study of the language grounding problem, where the goal is to extract representations of the meanings of natural language tied to the physical world. We present an approach for joint learning of language and perception models for grounded attribute induction. The perception model includes classifiers for physical characteristics and a language model based on a probabilistic categorial grammar that enables the construction of compositional meaning representations. We evaluate on the task of interpreting sentences that describe sets of objects in a physical workspace, and demonstrate accurate task performance and effective latent-variable concept induction in physical grounded scenes. 1.
Discovering localized attributes for fine-grained recognition
- In CVPR. IEEE
, 2012
"... red stripes on wings orange stripes on wings Attributes are visual concepts that can be detected by machines, understood by humans, and shared across categories. They are particularly useful for fine-grained domains where categories are closely related to one other (e.g. bird species recognition). I ..."
Abstract
-
Cited by 52 (1 self)
- Add to MetaCart
(Show Context)
red stripes on wings orange stripes on wings Attributes are visual concepts that can be detected by machines, understood by humans, and shared across categories. They are particularly useful for fine-grained domains where categories are closely related to one other (e.g. bird species recognition). In such scenarios, relevant attributes are often local (e.g. “white belly”), but the question of how to choose these local attributes remains largely unexplored. In this paper, we propose an interactive approach that discovers local attributes that are both discriminative and semantically meaningful from image datasets annotated only with fine-grained category labels and object bounding boxes. Our approach uses a latent conditional random field model to discover candidate attributes that are detectable and discriminative, and then employs a recommender system that selects attributes likely to be semantically meaningful. Human interaction is used to provide semantic names for the discovered attributes. We demonstrate our method on two challenging datasets, Caltech-UCSD Birds-200-2011 and Leeds Butterflies, and find that our discovered attributes outperform those generated by traditional approaches. 1.
WhittleSearch: Image Search with Relative Attribute Feedback
"... We propose a novel mode of feedback for image search, where a user describes which properties of exemplar images should be adjusted in order to more closely match his/her mental model of the image(s) sought. For example, perusing image results for a query “black shoes”, the user might state, “Show m ..."
Abstract
-
Cited by 51 (15 self)
- Add to MetaCart
(Show Context)
We propose a novel mode of feedback for image search, where a user describes which properties of exemplar images should be adjusted in order to more closely match his/her mental model of the image(s) sought. For example, perusing image results for a query “black shoes”, the user might state, “Show me shoe images like these, but sportier. ” Offline, our approach first learns a set of ranking functions, each of which predicts the relative strength of a nameable attribute in an image (‘sportiness’, ‘furriness’, etc.). At query time, the system presents an initial set of reference images, and the user selects among them to provide relative attribute feedback. Using the resulting constraints in the multi-dimensional attribute space, our method updates its relevance function and re-ranks the pool of images. This procedure iterates using the accumulated constraints until the top ranked images are acceptably close to the user’s envisioned target. In this way, our approach allows a user to efficiently “whittle away ” irrelevant portions of the visual feature space, using semantic language to precisely communicate her preferences to the system. We demonstrate the technique for refining image search for people, products, and scenes, and show it outperforms traditional binary relevance feedback in terms of search speed and accuracy. 1.
NEIL: Extracting visual knowledge from web data
- In ICCV
"... We propose NEIL (Never Ending Image Learner), a com-puter program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from In-ternet data. NEIL uses a semi-supervised learning algo-rithm that jointly discovers common sense relationships (e.g., “Corolla is a kind ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
(Show Context)
We propose NEIL (Never Ending Image Learner), a com-puter program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from In-ternet data. NEIL uses a semi-supervised learning algo-rithm that jointly discovers common sense relationships (e.g., “Corolla is a kind of/looks similar to Car”,“Wheel is a part of Car”) and labels instances of the given visual categories. It is an attempt to develop the world’s largest visual structured knowledge base with minimum human la-beling effort. As of 10th October 2013, NEIL has been con-tinuously running for 2.5 months on 200 core cluster (more than 350K CPU hours) and has an ontology of 1152 object categories, 1034 scene categories and 87 attributes. During this period, NEIL has discovered more than 1700 relation-ships and has labeled more than 400K visual instances. 1.
S.: Attribute Learning for Understanding Unstructured Social Activity
- ECCV 2012, Part IV. LNCS
, 2012
"... Abstract. The rapid development of social video sharing platforms has created a huge demand for automatic video classification and annotation techniques, in particular for videos containing social activities of a group of people (e.g. YouTube video of a wedding reception). Recently, attribute learni ..."
Abstract
-
Cited by 28 (14 self)
- Add to MetaCart
(Show Context)
Abstract. The rapid development of social video sharing platforms has created a huge demand for automatic video classification and annotation techniques, in particular for videos containing social activities of a group of people (e.g. YouTube video of a wedding reception). Recently, attribute learning has emerged as a promising paradigm for transferring learning to sparsely labelled classes in object or single-object short action classification. In contrast to existing work, this paper for the first time, tackles the problem of attribute learning for understanding group social activities with sparse labels. This problem is more challenging because of the complex multi-object nature of social activities, and the unstructured nature of the activity context. To solve this problem, we (1) contribute an unstructured social activity attribute (USAA) dataset with both visual and audio attributes, (2) introduce the concept of semi-latent attribute space and (3) propose a novel model for learning the latent attributes which alleviate the dependence of existing models on exact and exhaustive manual specification of the attribute-space. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multi-media sparse data learning tasks including: multi-task learning, N-shot transfer learning, learning with label noise and importantly zero-shot learning. 1
Attributes for Classifier Feedback
"... Abstract. Traditional active learning allows a (machine) learner to query the (human) teacher for labels on examples it finds confusing. The teacher then pro-vides a label for only that instance. This is quite restrictive. In this paper, we pro-pose a learning paradigm in which the learner communica ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
(Show Context)
Abstract. Traditional active learning allows a (machine) learner to query the (human) teacher for labels on examples it finds confusing. The teacher then pro-vides a label for only that instance. This is quite restrictive. In this paper, we pro-pose a learning paradigm in which the learner communicates its belief (i.e. pre-dicted label) about the actively chosen example to the teacher. The teacher then confirms or rejects the predicted label. More importantly, if rejected, the teacher communicates an explanation for why the learner’s belief was wrong. This ex-planation allows the learner to propagate the feedback provided by the teacher to many unlabeled images. This allows a classifier to better learn from its mistakes, leading to accelerated discriminative learning of visual concepts even with few la-beled images. In order for such communication to be feasible, it is crucial to have a language that both the human supervisor and the machine learner understand. Attributes provide precisely this channel. They are human-interpretable mid-level visual concepts shareable across categories e.g. “furry”, “spacious”, etc. We ad-vocate the use of attributes for a supervisor to provide feedback to a classifier and directly communicate his knowledge of the world. We employ a straightforward approach to incorporate this feedback in the classifier, and demonstrate its power on a variety of visual recognition scenarios such as image classification and an-notation. This application of attributes for providing classifiers feedback is very powerful, and has not been explored in the community. It introduces a new mode of supervision, and opens up several avenues for future research. 1
Constrained Semi-Supervised Learning using Attributes and Comparative Attributes
"... Abstract. We consider the problem of semi-supervised bootstrap learning for scene categorization. Existing semi-supervised approaches are typically unreliable and face semantic drift because the learning task is under-constrained. This is primarily because they ignore the strong interactions that of ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
(Show Context)
Abstract. We consider the problem of semi-supervised bootstrap learning for scene categorization. Existing semi-supervised approaches are typically unreliable and face semantic drift because the learning task is under-constrained. This is primarily because they ignore the strong interactions that often exist between scene categories, such as the common attributes shared across categories as well as the attributes which make one scene different from another. The goal of this paper is to exploit these relationships and constrain the semi-supervised learning problem. For example, the knowledge that an image is an auditorium can improve labeling of amphitheaters by enforcing constraint that an amphitheater image should have more circular structures than an auditorium image. We propose constraints based on mutual exclusion, binary attributes and comparative attributes and show that they help us to constrain the learning problem and avoid semantic drift. We demonstrate the effectiveness of our approach through extensive experiments, including results on a very large dataset of one million images. 1
T.: Deformable part descriptors for fine-grained recognition and attribute prediction
- In: ICCV (2013
"... Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
(Show Context)
Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms. 1.
M.: Style-aware mid-level representation for discovering visual connections in space and time
- In: Proc. ICCV (2013
"... We present a weakly-supervised visual data mining approach that discovers connections between recurring midlevel visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the underlying visual style. In contrast to existing discovery methods that mine ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
(Show Context)
We present a weakly-supervised visual data mining approach that discovers connections between recurring midlevel visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the underlying visual style. In contrast to existing discovery methods that mine for patterns that remain visually consistent throughout the dataset, our goal is to discover visual elements whose appearance changes due to change in time or location; i.e., exhibit consistent stylistic variations across the label space (date or geo-location). To discover these elements, we first identify groups of patches that are stylesensitive. We then incrementally build correspondences to find the same element across the entire dataset. Finally, we train style-aware regressors that model each element’s range of stylistic differences. We apply our approach to date and geo-location prediction and show substantial improvement over several baselines that do not model visual style. We also demonstrate the method’s effectiveness on the related task of fine-grained classification. 1.