Results 1 - 10
of
150
80 million tiny images: a large dataset for non-parametric object and scene recognition
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... ..."
Evaluating Color Descriptors for Object and Scene Recognition
, 2010
"... Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used for feature extraction at salient points. To increase illumination invariance and discriminative power, color descriptors have been ..."
Abstract
-
Cited by 99 (14 self)
- Add to MetaCart
Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used for feature extraction at salient points. To increase illumination invariance and discriminative power, color descriptors have been proposed. Because many different descriptors exist, a structured overview is required of color invariant descriptors in the context of image category recognition. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors (software to compute the color descriptors from this paper is available from
Zwol. Flickr tag recommendation based on collective knowledge
- In WWW ’08: Proc. of the 17th International Conference on World Wide Web
, 2008
"... Online photo services such as Flickr and Zooomr allow users to share their photos with family, friends, and the online community at large. An important facet of these services is that users manually annotate their photos using so called tags, which describe the contents of the photo or provide addit ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
Online photo services such as Flickr and Zooomr allow users to share their photos with family, friends, and the online community at large. An important facet of these services is that users manually annotate their photos using so called tags, which describe the contents of the photo or provide additional contextual and semantical information. In this paper we investigate how we can assist users in the tagging phase. The contribution of our research is twofold. We analyse a representative snapshot of Flickr and present the results by means of a tag characterisation focussing on how users tags photos and what information is contained in the tagging. Based on this analysis, we present and evaluate tag recommendation strategies to support the user in the photo annotation task by recommending a set of tags that can be added to the photo. The results of the empirical evaluation show that we can effectively recommend relevant tags for a variety of photos with different levels of exhaustiveness of original tagging.
Small codes and large image databases for recognition
- In Proceedings of the IEEE Conf on Computer Vision and Pattern Recognition
, 2008
"... The Internet contains billions of images, freely available online. Methods for efficiently searching this incredibly rich resource are vital for a large number of applications. These include object recognition [2], computer graphics [11, 27], personal photo collections, online image search tools. In ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
The Internet contains billions of images, freely available online. Methods for efficiently searching this incredibly rich resource are vital for a large number of applications. These include object recognition [2], computer graphics [11, 27], personal photo collections, online image search tools. In this paper, our goal is to develop efficient image search and scene matching techniques that are not only fast, but also require very little memory, enabling their use on standard hardware or even on handheld devices. Our approach uses recently developed machine learning techniques to convert the Gist descriptor (a real valued vector that describes orientation energies at different scales and orientations within an image) to a compact binary code, with a few hundred bits per image. Using our scheme, it is possible to perform real-time searches with millions from the Internet using a single large PC and obtain recognition results comparable to the full descriptor. Using our codes on high quality labeled images from the LabelMe database gives surprisingly powerful recognition results using simple nearest neighbor techniques. Recent interest in object recognition has yielded a wide range of approaches to describing the contents of an image. One important application for this technology is the visual search of large collections of images, such as those on the Internet or on people’s home computers. Accordingly, a number of recognition papers have explored this area. Nister and Stewenius demonstrate the real-time specific object recognition using a database of 40,000 images [19]; Obdrzalek and Matas show sub-linear indexing time on the COIL dataset [20]. A common theme is the representation of the image as a collection of feature vectors and the use of efficient data structures to handle the large num-
A New Baseline for Image Annotation
"... Abstract. Automatically assigning keywords to images is of great interest as it allows one to index, retrieve, and understand large collections of image data. Many techniques have been proposed for image annotation in the last decade that give reasonable performance on standard datasets. However, mo ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Abstract. Automatically assigning keywords to images is of great interest as it allows one to index, retrieve, and understand large collections of image data. Many techniques have been proposed for image annotation in the last decade that give reasonable performance on standard datasets. However, most of these works fail to compare their methods with simple baseline techniques to justify the need for complex models and subsequent training. In this work, we introduce a new baseline technique for image annotation that treats annotation as a retrieval problem. The proposed technique utilizes low-level image features and a simple combination of basic distances to find nearest neighbors of a given image. The keywords are then assigned using a greedy label transfer mechanism. The proposed baseline outperforms the current state-of-the-art methods on two standard and one large Web dataset. We believe that such a baseline measure will provide a strong platform to compare and better understand future annotation techniques. 1
Learning tag relevance by neighbor voting for social image retrieval
- In ACM MIR
, 2008
"... Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably interpret the relevance of a tag with respect to the ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably interpret the relevance of a tag with respect to the visual content it is describing. Intuitively, if different persons label similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose a novel algorithm that scalably and reliably learns tag relevance by accumulating votes from visually similar neighbors. Further, treated as tag frequency, learned tag relevance is seamlessly embedded into current tag-based social image retrieval paradigms. Preliminary experiments on one million Flickr images demonstrate the potential of the proposed algorithm. Overall comparisons for both single-word queries and multiple-word queries show substantial improvement over the baseline by learning and using tag relevance. Specifically, compared with the baseline using the original tags, on average, retrieval using improved tags increases mean average precision by 24%, from 0.54 to 0.67. Moreover, simulated experiments indicate that performance can be improved further by scaling up the amount of images used in the proposed neighbor voting algorithm.
Ontologies and the Semantic Web
"... The goal of semantic web research is to allow the vast range of web-accessible information and services to be more effectively exploited by both humans and automated tools. To facilitate this process, RDF and OWL have been developed as standard formats for the sharing and integration of data and kno ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
The goal of semantic web research is to allow the vast range of web-accessible information and services to be more effectively exploited by both humans and automated tools. To facilitate this process, RDF and OWL have been developed as standard formats for the sharing and integration of data and knowledge—the latter in the form of rich conceptual schemas called ontologies. These languages, and the tools developed to support them, have rapidly become de facto standards for ontology development and deployment; they are increasingly used, not only in research labs, but in large scale IT projects. Although many research and development challenges still remain, these “semantic web technologies” are already starting to exert a major influence on the development of information technology. 1.
Data Clustering: 50 Years Beyond K-Means
, 2008
"... Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and m ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is exploratory in nature to find structure in data. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty of designing a general purpose clustering algorithm and the illposed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection, and data clustering and large scale data clustering.
Learning Image Similarity from Flickr Groups Using Stochastic Intersection Kernel Machines
"... Measuring image similarity is a central topic in computer vision. In this paper, we learn similarity from Flickr groups and use it to organize photos. Two images are similar if they are likely to belong to the same Flickr groups. Our approach is enabled by a fast Stochastic Intersection Kernel MAchi ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Measuring image similarity is a central topic in computer vision. In this paper, we learn similarity from Flickr groups and use it to organize photos. Two images are similar if they are likely to belong to the same Flickr groups. Our approach is enabled by a fast Stochastic Intersection Kernel MAchine (SIKMA) training algorithm, which we propose. This proposed training method will be useful for many vision problems, as it can produce a classifier that is more accurate than a linear classifier, trained on tens of thousands of examples in two minutes. The experimental results show our approach performs better on image matching, retrieval, and classification than using conventional visual features. 1.
Textual Query of Personal Photos Facilitated by Large-scale Web Data
"... Abstract—The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of personal photo collections by consumers. In this paper, we present a real-time textual query based personal photo retrieval system by leveraging millions of web images and their associate ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Abstract—The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of personal photo collections by consumers. In this paper, we present a real-time textual query based personal photo retrieval system by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc.). After a user provides a textual query (e.g., “water”), our system exploits the inverted file to automatically find the positive web images that are related to the textual query “water ” as well as the negative web images that are irrelevant to the textual query. Based on these automatically retrieved relevant and irrelevant web images, we employ three simple but effective classification methods, k Nearest Neighbor (kNN), decision stumps and linear SVM, to rank personal photos. To further improve the photo retrieval performance, we propose two relevance feedback methods via cross-domain learning, which effectively utilize both the web images and personal images. In particular, our proposed cross-domain learning methods can learn robust classifiers with only a very limited amount of labeled personal photos from the user by leveraging the pre-learned linear SVM classifiers in real time. We further propose an incremental cross-domain learning method in order to significantly accelerate the relevance feedback process on large consumer photo databases. Extensive experiments on two consumer photo datasets demonstrate the effectiveness and efficiency of our system, which is also inherently not limited by any predefined lexicon.

