Results 1 - 10
of
14
WhittleSearch: Image Search with Relative Attribute Feedback
"... We propose a novel mode of feedback for image search, where a user describes which properties of exemplar images should be adjusted in order to more closely match his/her mental model of the image(s) sought. For example, perusing image results for a query “black shoes”, the user might state, “Show m ..."
Abstract
-
Cited by 51 (15 self)
- Add to MetaCart
(Show Context)
We propose a novel mode of feedback for image search, where a user describes which properties of exemplar images should be adjusted in order to more closely match his/her mental model of the image(s) sought. For example, perusing image results for a query “black shoes”, the user might state, “Show me shoe images like these, but sportier. ” Offline, our approach first learns a set of ranking functions, each of which predicts the relative strength of a nameable attribute in an image (‘sportiness’, ‘furriness’, etc.). At query time, the system presents an initial set of reference images, and the user selects among them to provide relative attribute feedback. Using the resulting constraints in the multi-dimensional attribute space, our method updates its relevance function and re-ranks the pool of images. This procedure iterates using the accumulated constraints until the top ranked images are acceptably close to the user’s envisioned target. In this way, our approach allows a user to efficiently “whittle away ” irrelevant portions of the visual feature space, using semantic language to precisely communicate her preferences to the system. We demonstrate the technique for refining image search for people, products, and scenes, and show it outperforms traditional binary relevance feedback in terms of search speed and accuracy. 1.
Attributes for Classifier Feedback
"... Abstract. Traditional active learning allows a (machine) learner to query the (human) teacher for labels on examples it finds confusing. The teacher then pro-vides a label for only that instance. This is quite restrictive. In this paper, we pro-pose a learning paradigm in which the learner communica ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
Abstract. Traditional active learning allows a (machine) learner to query the (human) teacher for labels on examples it finds confusing. The teacher then pro-vides a label for only that instance. This is quite restrictive. In this paper, we pro-pose a learning paradigm in which the learner communicates its belief (i.e. pre-dicted label) about the actively chosen example to the teacher. The teacher then confirms or rejects the predicted label. More importantly, if rejected, the teacher communicates an explanation for why the learner’s belief was wrong. This ex-planation allows the learner to propagate the feedback provided by the teacher to many unlabeled images. This allows a classifier to better learn from its mistakes, leading to accelerated discriminative learning of visual concepts even with few la-beled images. In order for such communication to be feasible, it is crucial to have a language that both the human supervisor and the machine learner understand. Attributes provide precisely this channel. They are human-interpretable mid-level visual concepts shareable across categories e.g. “furry”, “spacious”, etc. We ad-vocate the use of attributes for a supervisor to provide feedback to a classifier and directly communicate his knowledge of the world. We employ a straightforward approach to incorporate this feedback in the classifier, and demonstrate its power on a variety of visual recognition scenarios such as image classification and an-notation. This application of attributes for providing classifiers feedback is very powerful, and has not been explored in the community. It introduces a new mode of supervision, and opens up several avenues for future research. 1
Learning the Visual Interpretation of Sentences
"... Sentences that describe visual scenes contain a wide va-riety of information pertaining to the presence of objects, their attributes and their spatial relations. In this paper we learn the visual features that correspond to semantic phrases derived from sentences. Specifically, we extract predicate ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
(Show Context)
Sentences that describe visual scenes contain a wide va-riety of information pertaining to the presence of objects, their attributes and their spatial relations. In this paper we learn the visual features that correspond to semantic phrases derived from sentences. Specifically, we extract predicate tuples that contain two nouns and a relation. The relation may take several forms, such as a verb, preposition, adjective or their combination. We model a scene using a Conditional Random Field (CRF) formulation where each node corresponds to an object, and the edges to their rela-tions. We determine the potentials of the CRF using the tu-ples extracted from the sentences. We generate novel scenes depicting the sentences ’ visual meaning by sampling from the CRF. The CRF is also used to score a set of scenes for a text-based image retrieval task. Our results show we can generate (retrieve) scenes that convey the desired seman-tic meaning, even when scenes (queries) are described by multiple sentences. Significant improvement is found over several baseline approaches. 1.
Accessible Image Search
, 2009
"... There are about 8 % of men and 0.8 % of women suffering from colorblindness. We show that the existing image search techniques cannot provide satisfactory results for these users, since many images will not be well perceived by them due to the loss of color information. In this paper, we introduce a ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
(Show Context)
There are about 8 % of men and 0.8 % of women suffering from colorblindness. We show that the existing image search techniques cannot provide satisfactory results for these users, since many images will not be well perceived by them due to the loss of color information. In this paper, we introduce a scheme named Accessible Image Search (AIS) to accommodate these users. Different from the general image search scheme that aims at returning more relevant results, AIS further takes into account the colorblind accessibilities of the returned results, i.e., the image qualities in the eyes of colorblind users. The scheme includes two components: accessibility assessment and accessibility improvement. For accessibility assessment, we introduce an analysisbased method and a learning-based method. Based on the measured accessibility scores, different reranking methods can be performed to prioritize the images with high accessibilities. In accessibility improvement component, we propose an efficient recoloring algorithm to modify the colors of the images such that they can be better perceived by colorblind users. We also propose the Accessibility Average Precision (AAP) for AIS as a complementary performance evaluation measure to the conventional relevance-based evaluation methods. Experimental results with more than 60,000 images and 20 anonymous colorblind users demonstrate the effectiveness and usefulness of the proposed scheme.
Minimally needed evidence for complex event recognition in unconstrained videos
- In ICMR
, 2014
"... This paper addresses the fundamental question – How do humans recognize complex events in videos? Normally, humans view videos in a sequential manner. We hypothesize that humans can make high-level inference such as an event is present or not in a video, by looking at a very small number of frames n ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
This paper addresses the fundamental question – How do humans recognize complex events in videos? Normally, humans view videos in a sequential manner. We hypothesize that humans can make high-level inference such as an event is present or not in a video, by looking at a very small number of frames not necessarily in a linear order. We attempt to verify this cognitive capability of humans and to discover the Minimally Needed Evidence (MNE) for each event. To this end, we introduce an online game based event quiz facilitat-ing selection of minimal evidence required by humans to judge the presence or absence of a complex event in an open source video. Each video is divided into a set of temporally coherent microshots (1.5 secs in length) which are revealed only on player request. The player’s task is to identify the positive and negative occurrences of the given target event with minimal number of requests to reveal evidence. Incentives are given to players for correct identification with the minimal number of requests. Our extensive human study using the game quiz validates our hypothesis- 55 % of videos need only one microshot for correct human judgment and events of varying complexity require differ-ent amounts of evidence for human judgment. In addition, the pro-posed notion of MNE enables us to select discriminative features, drastically improving speed and accuracy of a video retrieval sys-tem.
Implied feedback: Learning nuances of user behavior in image search
- In ICCV
, 2013
"... User feedback helps an image search system refine its relevance predictions, tailoring the search towards the user’s preferences. Existing methods simply take feedback at face value: clicking on an image means the user wants things like it; commenting that an image lacks a specific attribute means t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
User feedback helps an image search system refine its relevance predictions, tailoring the search towards the user’s preferences. Existing methods simply take feedback at face value: clicking on an image means the user wants things like it; commenting that an image lacks a specific attribute means the user wants things that have it. How-ever, we expect there is actually more information behind the user’s literal feedback. In particular, a user’s (possibly subconscious) search strategy leads him to comment on cer-tain images rather than others, based on how any of the vis-ible candidate images compare to the desired content. For example, he may be more likely to give negative feedback on an irrelevant image that is relatively close to his target, as opposed to bothering with one that is altogether different. We introduce novel features to capitalize on such implied feedback cues, and learn a ranking function that uses them to improve the system’s relevance estimates. We validate the approach with real users searching for shoes, faces, or scenes using two different modes of feedback: binary rele-vance feedback and relative attributes-based feedback. The results show that retrieval improves significantly when the system accounts for the learned behaviors. We show that the nuances learned are domain-invariant, and useful for both generic user-independent search as well as personal-ized user-specific search. 1.
Memory Recall Based Video Search: Finding Videos You Have Seen Before Based on Your Memory
"... We often remember images and videos that we have seen or recorded before but cannot quite recall the exact venues or details of the contents. We typically have vague memories of the contents, which can often be expressed as a textual description and/or rough visual descriptions of the scenes. Using ..."
Abstract
- Add to MetaCart
(Show Context)
We often remember images and videos that we have seen or recorded before but cannot quite recall the exact venues or details of the contents. We typically have vague memories of the contents, which can often be expressed as a textual description and/or rough visual descriptions of the scenes. Using these vague memories, we then want to search for the corresponding videos of interest. We call this “Memory Recall based Video Search ” (MRVS). To tackle this problem, we propose a video search system that permits a user to input his/her vague and incomplete query as a combination of text query, a sequence of visual queries, and/or concept queries. Here, a visual query is often in the form of a visual sketch depicting the outline of scenes within the desired video, while each corresponding concept query depicts a list of visual concepts that appears in that scene. As the query specified by users is generally approximate or incomplete, we need to develop techniques to handle this inexact and incomplete specification by also leveraging on user feedback to refine the specification. We utilize several innovative approaches to enhance the automatic search. First, we employ a visual query suggestion model to automatically suggest potential visual features to users as better queries. Second, we utilize a color similarity matrix to help compensate for inexact color specification in visual queries. Third, we leverage on the ordering of visual queries and/or concept queries to rerank the results by using a greedy algorithm. Moreover, as the query is inexact and there is likely to be only one or few possible answers, we incorporate an interactive feedback loop to permit the users to label related samples which are visually similar or semantically close to the relevant sample. Based on the labeled samples, we then propose optimization algorithms to update visual queries and concept
Interactive Video Search
, 2008
"... A_CU-run6: local feature alone – average fusion of 3 SVM classification results for each concept using various feature representation choices. A_CU-run5: linear weighted fusion of A_CU-run6 with two grid-based global features (color moment and wavelet texture). A_CU-run4: linear weighted fusion of A ..."
Abstract
- Add to MetaCart
(Show Context)
A_CU-run6: local feature alone – average fusion of 3 SVM classification results for each concept using various feature representation choices. A_CU-run5: linear weighted fusion of A_CU-run6 with two grid-based global features (color moment and wavelet texture). A_CU-run4: linear weighted fusion of A_CU-run5 with a SVM classification result using detection scores of CU-VIREO374 as features. C_CU-run3: linear weighted fusion of A_CU-run4 with a SVM classification result using web images. A_CU-run2: re-rank the results of “two_people ” and “singing ” from A_CU-run4 with concept-specific detectors. C_CU-run1: linear weighted fusion of A_CU-run2 with a SVM classification result using web images.
Description of Submitted Runs
, 2008
"... A_CU-run6: local feature alone – average fusion of 3 SVM classification results for each concept using various feature representation choices. A_CU-run5: linear weighted fusion of A_CU-run6 with two grid-based global features (color moment and wavelet texture). A_CU-run4: linear weighted fusion o ..."
Abstract
- Add to MetaCart
(Show Context)
A_CU-run6: local feature alone – average fusion of 3 SVM classification results for each concept using various feature representation choices. A_CU-run5: linear weighted fusion of A_CU-run6 with two grid-based global features (color moment and wavelet texture). A_CU-run4: linear weighted fusion of A_CU-run5 with a SVM classification result using detection scores of CU-VIREO374 as features. C_CU-run3: linear weighted fusion of A_CU-run4 with a SVM classification result using web images. A_CU-run2: re-rank the results of “two_people ” and “singing ” from A_CU-run4 with concept-specific detectors. C_CU-run1: linear weighted fusion of A_CU-run2 with a SVM classification result using web images. Interactive Video Search I_A_1_Colu_CuZero_formulate_nov_6: novice run of CuZero using query formulation alone. I_A_1_Colu_CuZero_formulate_exp_5: expert run of CuZero using query formulation alone.
Description of Submitted Runs High-Level Feature Extraction
, 2008
"... ! A_CU-run6: local feature alone – average fusion of 3 SVM classification results for each concept using various feature representation choices.! A_CU-run5: linear weighted fusion of A_CU-run6 with two grid-based global features (color moment and wavelet texture).! A_CU-run4: linear weighted fusion ..."
Abstract
- Add to MetaCart
(Show Context)
! A_CU-run6: local feature alone – average fusion of 3 SVM classification results for each concept using various feature representation choices.! A_CU-run5: linear weighted fusion of A_CU-run6 with two grid-based global features (color moment and wavelet texture).! A_CU-run4: linear weighted fusion of A_CU-run5 with a SVM classification result using detection scores of CU-VIREO374 as features.! C_CU-run3: linear weighted fusion of A_CU-run4 with a SVM classification result using web images.! A_CU-run2: re-rank the results of “two_people ” and “singing ” from A_CU-run4 with concept-specific detectors.! C_CU-run1: linear weighted fusion of A_CU-run2 with a SVM classification result using web images. Interactive Video Search Summary! I_A_1_Colu_CuZero_formulate_nov_6: novice run of CuZero using query formulation alone.! I_A_1_Colu_CuZero_formulate_exp_5: expert run of CuZero using query formulation alone.! I_A_1_Colu_CuZero_full_nov_2: novice run of full browser experience within CuZero.! I_A_1_Colu_CuZero_full_exp_1: expert run of full browser experience within CuZero.! I_A_1_Colu_CuZero_exp_exp_4: I_A_1_Colu_CuZero_full_exp_1 with story-based expansion for positively labeled subshots.! I_A_1_Colu_CuZero_reranked_exp_3: I_A_1_Colu_CuZero_exp_exp_4 with both storybased expansion and reranking of all non-expanded, non-labeled shots by near duplicate. In this report, we present overview and comparative analysis of our HLF detection system, which achieves the top performance among all type-A submissions in 2008. We also describe preliminary evaluation of our video search system, CuZero, in the interactive search task. Our aim for the HLF task is to answer the following questions. What's the performance edge of local