Results 1 - 10
of
223
Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation
- In ICCV
, 2009
"... Image auto-annotation is an important open problem in computer vision. For this task we propose TagProp, a discriminatively trained nearest neighbor model. Tags of test images are predicted using a weighted nearest-neighbor model to exploit labeled training images. Neighbor weights are based on neig ..."
Abstract
-
Cited by 154 (21 self)
- Add to MetaCart
(Show Context)
Image auto-annotation is an important open problem in computer vision. For this task we propose TagProp, a discriminatively trained nearest neighbor model. Tags of test images are predicted using a weighted nearest-neighbor model to exploit labeled training images. Neighbor weights are based on neighbor rank or distance. TagProp allows the integration of metric learning by directly maximizing the log-likelihood of the tag predictions in the training set. In this manner, we can optimally combine a collection of image similarity metrics that cover different aspects of image content, such as local shape descriptors, or global color histograms. We also introduce a word specific sigmoidal modulation of the weighted neighbor tag predictions to boost the recall of rare words. We investigate the performance of different variants of our model and compare to existing work. We present experimental results for three challenging data sets. On all three, TagProp makes a marked improvement as compared to the current state-of-the-art. 1.
A New Baseline for Image Annotation
"... Abstract. Automatically assigning keywords to images is of great interest as it allows one to index, retrieve, and understand large collections of image data. Many techniques have been proposed for image annotation in the last decade that give reasonable performance on standard datasets. However, mo ..."
Abstract
-
Cited by 138 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Automatically assigning keywords to images is of great interest as it allows one to index, retrieve, and understand large collections of image data. Many techniques have been proposed for image annotation in the last decade that give reasonable performance on standard datasets. However, most of these works fail to compare their methods with simple baseline techniques to justify the need for complex models and subsequent training. In this work, we introduce a new baseline technique for image annotation that treats annotation as a retrieval problem. The proposed technique utilizes low-level image features and a simple combination of basic distances to find nearest neighbors of a given image. The keywords are then assigned using a greedy label transfer mechanism. The proposed baseline outperforms the current state-of-the-art methods on two standard and one large Web dataset. We believe that such a baseline measure will provide a strong platform to compare and better understand future annotation techniques. 1
A New Approach to Cross-Modal Multimedia Retrieval
"... The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned with latent Dirichlet allocation, and images are represented as bags of visual (SIFT) features. Two hypotheses are investig ..."
Abstract
-
Cited by 81 (5 self)
- Add to MetaCart
(Show Context)
The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned with latent Dirichlet allocation, and images are represented as bags of visual (SIFT) features. Two hypotheses are investigated: that 1) there is a benefit to explicitly modeling correlations between the two components, and 2) this modeling is more effective in feature spaces with higher levels of abstraction. Correlations between the two components are learned with canonical correlation analysis. Abstraction is achieved by representing text and images at a more general, semantic level. The two hypotheses are studied in the context of the task of cross-modal document retrieval. This includes retrieving the text that most closely matches a query image, or retrieving the images that most closely match a query text. It is shown that accounting for crossmodal correlations and semantic abstraction both improve retrieval accuracy. The cross-modal model is also shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.
Bridging the gap: Query by semantic example
- IEEE TRANS. MULTIMEDIA
, 2007
"... A combination of query-by-visual-example (QBVE) and semantic retrieval (SR), denoted as query-by-semantic-example (QBSE), is proposed. Images are labeled with respect to a vocabulary of visual concepts, as is usual in SR. Each image is then represented by a vector, referred to as a semantic multinom ..."
Abstract
-
Cited by 73 (8 self)
- Add to MetaCart
A combination of query-by-visual-example (QBVE) and semantic retrieval (SR), denoted as query-by-semantic-example (QBSE), is proposed. Images are labeled with respect to a vocabulary of visual concepts, as is usual in SR. Each image is then represented by a vector, referred to as a semantic multinomial, of posterior concept probabilities. Retrieval is based on the query-by-example paradigm: the user provides a query image, for which 1) a semantic multinomial is computed and 2) matched to those in the database. QBSE is shown to have two main properties of interest, one mostly practical and the other philosophical. From a practical standpoint, because it inherits the generalization ability of SR inside the space of known visual concepts (referred to as the semantic space) but performs much better outside of it, QBSE produces retrieval systems that are more accurate than what was previously possible. Philosophically, because it allows a direct comparison of visual and semantic representations under a common query paradigm, QBSE enables the design of experiments that explicitly test the value of semantic representations for image retrieval. An implementation of QBSE under the minimum probability of error (MPE) retrieval framework, previously applied with success to both QBVE and SR, is proposed, and used to demonstrate the two properties. In particular, an extensive objective comparison of QBSE with QBVE is presented, showing that the former significantly outperforms the latter both inside and outside the semantic space. By carefully controlling the structure of the semantic space, it is also shown that this improvement can only be attributed to the semantic nature of the representation on which QBSE is based.
A game-based approach for collecting semantic annotations of music
- In 8th International Conference on Music Information Retrieval (ISMIR
, 2007
"... Games based on human computation are a valuable tool for collecting semantic information about images. We show how to transfer this idea into the music domain in order to collect high-quality semantic information about songs. We present Listen Game, a online, multiplayer game that measures the seman ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
(Show Context)
Games based on human computation are a valuable tool for collecting semantic information about images. We show how to transfer this idea into the music domain in order to collect high-quality semantic information about songs. We present Listen Game, a online, multiplayer game that measures the semantic relationship between music and words. In the normal mode, a player sees a list of semantically related words (e.g., instruments, emotions, usages, genres) and is asked to pick the best and worst word to describe a song. In the freestyle mode, a user is asked to suggest a new word that describes the music. Each player receives realtime feedback about the agreement amongst all players. We show that we can use the data collected during a two-week pilot study of Listen Game to learn a supervised multiclass labeling (SML) model. We show that this SML model can annotate a novel song with meaningful words and retrieve relevant songs from a database of audio content. 1
Automatic image annotation using group sparsity
- In CVPR
, 2010
"... Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, but properties of features have not been well investigated. In most case ..."
Abstract
-
Cited by 42 (13 self)
- Add to MetaCart
(Show Context)
Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, but properties of features have not been well investigated. In most cases, a group of features is preselected, yet important feature properties are not well used to select features. In this paper, we introduce a regularization based feature selection algorithm to leverage both the sparsity and clustering properties of features, and incorporate it into the image annotation task. A novel approach is also proposed to iteratively obtain similar and dissimilar pairs from both the keyword similarity and the relevance feedback. Thus keyword similarity is modeled in the annotation framework. Numerous experiments are designed to compare the performance between features, feature combinations and regularization based feature selection methods applied on the image annotation task, which gives insight into the properties of features in the image annotation task. The experimental results demonstrate that the group sparsity based method is more accurate and stable than others. 1.
Multi-instance multi-label learning
- Artificial Intelligence
"... In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicate ..."
Abstract
-
Cited by 38 (16 self)
- Add to MetaCart
(Show Context)
In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Consideringthat the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly.
FIVE APPROACHES TO COLLECTING TAGS FOR MUSIC
"... We compare five approaches to collecting tags for music: conducting a survey, harvesting social tags, deploying annotation games, mining web documents, and autotagging audio content. The comparison includes a discussion of both scalability (financial cost, human involvement, and computational resour ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
(Show Context)
We compare five approaches to collecting tags for music: conducting a survey, harvesting social tags, deploying annotation games, mining web documents, and autotagging audio content. The comparison includes a discussion of both scalability (financial cost, human involvement, and computational resources) and quality (the cold start problem & popularity bias, strong vs. weak labeling, vocabulary structure & size, and annotation accuracy). We then describe one state-ofthe-art system for each approach. The performance of each system is evaluated using a tag-based music information retrieval task. Using this task, we are able to quantify the effect of popularity bias on each approach by making use of a subset of more popular (short-head) songs and a set of less popular (long-tail) songs. Lastly, we propose a simple hybrid context-content system that combines our individual approaches and produces superior retrieval results. 1
Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning
"... The Berkeley Drosophila Genome Project (BDGP) has produced a large number of gene expression patterns, many of which have been annotated textually with anatomical and developmental terms. These terms spatially correspond to local regions of the images; however, they are attached collectively to grou ..."
Abstract
-
Cited by 36 (12 self)
- Add to MetaCart
(Show Context)
The Berkeley Drosophila Genome Project (BDGP) has produced a large number of gene expression patterns, many of which have been annotated textually with anatomical and developmental terms. These terms spatially correspond to local regions of the images; however, they are attached collectively to groups of images, such that it is unknown which term is assigned to which region of which image in the group. This poses a challenge to the development of the computational method to automate the textual description of expression patterns contained in each image. In this paper, we show that the underlying nature of this task matches well with a new machine learning framework, Multi-Instance Multi-Label learning (MIML). We propose a new MIML support vector machine to solve the problems that beset the annotation task. Empirical study shows that the proposed method outperforms the state-of-the-art Drosophila gene expression pattern annotation methods. 1