Results 1 - 10
of
88
Matching words and pictures
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation ..."
Abstract
-
Cited by 391 (33 self)
- Add to MetaCart
We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation) and corresponding to particular image regions (region naming). Auto-annotation might help organize and access large collections of images. Region naming is a model of object recognition as a process of translating image regions to words, much as one might translate from one language to another. Learning the relationships between image regions and semantic correlates (words) is an interesting example of multi-modal data mining, particularly because it is typically hard to apply data mining techniques to collections of images. We develop a number of models for the joint distribution of image regions and words, including several which explicitly learn the correspondence between regions and words. We study multi-modal and correspondence extensions to Hofmann’s hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multi-modal extension to mixture of latent Dirichlet allocation
Learning the Semantics of Words and Pictures
, 2000
"... We present a statistical model for organizing image collections which integrates semantic information provided by associated text and visual information provided by image features. The model is very promising for information retrieval tasks such as database browsing and searching for images based on ..."
Abstract
-
Cited by 179 (11 self)
- Add to MetaCart
We present a statistical model for organizing image collections which integrates semantic information provided by associated text and visual information provided by image features. The model is very promising for information retrieval tasks such as database browsing and searching for images based on text and/or image features. Furthermore, since the model learns relationships between text and image features, it can be used for novel applications such as associating words with pictures, and unsupervised learning for object recognition. 1.
Multimodal Video Indexing: A Review of the State-of-the-art
- Multimedia Tools and Applications
, 2003
"... Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video in ..."
Abstract
-
Cited by 103 (18 self)
- Add to MetaCart
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.
Probabilistic Methods for Finding People
- INTERNATIONAL JOURNAL OF COMPUTER VISION
, 2001
"... Finding people in pictures presents a particularly difficult object recognition problem. We show how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic prope ..."
Abstract
-
Cited by 77 (2 self)
- Add to MetaCart
Finding people in pictures presents a particularly difficult object recognition problem. We show how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic properties. Since a reasonable model of a person requires at least nine segments, it is not possible to inspect every group, due to the huge combinatorial complexity. We propose two
An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages
- In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
, 2000
"... The growing problem of unsolicited bulk e-mail, also known as "spare", has generated a need for reliable anti-spam e-mail filters. Filters of this type have so far been based mostly on manually constructed keyword patterns. An alternative approach has recently been proposed, whereby a Naiv ..."
Abstract
-
Cited by 74 (2 self)
- Add to MetaCart
The growing problem of unsolicited bulk e-mail, also known as "spare", has generated a need for reliable anti-spam e-mail filters. Filters of this type have so far been based mostly on manually constructed keyword patterns. An alternative approach has recently been proposed, whereby a Naive Bayesian classifier is trained automatically to detect spam messages. We test this approach on a large collection of personal e-mail messages, which we make publicly available in "encrypted " form contributing towards standard benchmarks. We introduce appropriate cost-sensitive measures, investigating at the same time the effect of attribute-set size, training-corpus size, lemmatization, and stop lists, issues that have not been explored in previous experiments. Finally, the Naive Bayesian filter is compared, in terms of performance, to a filter that uses keyword patterns, and which is part of a widely used e-mail reader. Keywords filtering/routing; text categorization; machine learning and IR; evaluation (general); test collections I.
Peekaboom: A Game for Locating Objects in Images
- In ACM CHI
, 2006
"... We introduce Peekaboom, an entertaining web-based game that can help computers locate objects in images. People play the game because of its entertainment value, and as a side effect of them playing, we collect valuable image metadata, such as which pixels belong to which object in the image. The co ..."
Abstract
-
Cited by 70 (4 self)
- Add to MetaCart
We introduce Peekaboom, an entertaining web-based game that can help computers locate objects in images. People play the game because of its entertainment value, and as a side effect of them playing, we collect valuable image metadata, such as which pixels belong to which object in the image. The collected data could be applied towards constructing more accurate computer vision algorithms, which require massive amounts of training and testing data not currently available. Peekaboom has been played by thousands of people, some of whom have spent over 12 hours a day playing, and thus far has generated millions of data points. In addition to its purely utilitarian aspect, Peekaboom is an example of a new, emerging class of games, which not only bring people together for leisure purposes, but also exist to improve artificial intelligence. Such games appeal to a general audience, while providing answers to problems that computers cannot yet solve. Author Keywords Distributed knowledge acquisition, object segmentation, object recognition, computer vision, Web-based games. ACM Classification Keywords: I.2.6 [Learning]: Knowledge acquisition. H.5.3 [HCI]: Web-based interaction.
Body Plans
, 1997
"... This paper describes a representation for people and animals, called a body plan, which is adapted to segmentation and to recognition in complex environments. The representation is an organized collection of grouping hints obtained from a combination of constraints on color and texture and constrain ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
This paper describes a representation for people and animals, called a body plan, which is adapted to segmentation and to recognition in complex environments. The representation is an organized collection of grouping hints obtained from a combination of constraints on color and texture and constraints on geometric properties such as the structure of individual parts and the relationships between parts. Body plans can be learned from image data, using established statistical learning techniques. The approach is illustrated with two examples of programs that successfully use body plans for recognition: one example involves determining whether a picture contains a scantily clad human, using a body plan built by hand; the other involves determining whether a picture contains a horse, using a body plan learned from image data. In both cases, the system demonstrates excellent performance on large, uncontrolled test sets and very large and diverse control sets. Keywords: Object Recognition, ...
A survey on pixel-based skin color detection techniques
- In ICCGV
, 2003
"... Skin color has proven to be a useful and robust cue for face detection, localization and tracking. Image content filtering, content-aware video compression and image color balancing applications can also benefit from automatic detection of skin in images. Numerous techniques for skin color modelling ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
Skin color has proven to be a useful and robust cue for face detection, localization and tracking. Image content filtering, content-aware video compression and image color balancing applications can also benefit from automatic detection of skin in images. Numerous techniques for skin color modelling and recognition have been proposed during several past years. A few papers comparing different approaches have been published [Zarit et al. 1999], [Terrillon et al. 2000], [Brand and Mason 2000]. However, a comprehensive survey on the topic is still missing. We try to fill this vacuum by reviewing most widely used methods and techniques and collecting their numerical evaluation results.
Spatial Color Indexing and Applications
, 1998
"... We suggest the use of the color correlogram as a generic indexing tool to tackle various computer vision problems. Correlograms were shown to be very effective for contentbased image retrieval [4]. We adapt the correlogram to handle the problems of image subregion querying, object localization, obje ..."
Abstract
-
Cited by 57 (3 self)
- Add to MetaCart
We suggest the use of the color correlogram as a generic indexing tool to tackle various computer vision problems. Correlograms were shown to be very effective for contentbased image retrieval [4]. We adapt the correlogram to handle the problems of image subregion querying, object localization, object tracking, and cut detection. Experimental results suggest that the color correlogram is much more effective than the histogram for these applications, with insignificant additional computational, storage, or processing cost. We also provide a technique to cut down the storage requirement of correlograms so that it is the same as that of histograms, with only negligible performance penalty compared to the original correlogram. 1
Detecting, Localizing and Grouping Repeated Scene Elements From an Image
, 1996
"... This paper presents an algorithm for detecting, localizing and grouping instances of repeated scene elements. The grouping is represented by a graph vhere nodes correspond to individual elements and arcs join spatially neighboring elements. Associated vith each arc is an affine map that best transfo ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
This paper presents an algorithm for detecting, localizing and grouping instances of repeated scene elements. The grouping is represented by a graph vhere nodes correspond to individual elements and arcs join spatially neighboring elements. Associated vith each arc is an affine map that best transforms the image patch at one location to the other. The approach ve propose consists of 4 steps: (1) detecting "interesting" elements in the image; (2) matching elements vith their neighbors and estimating the affine transform betveen them; (3) grooving the element to form a more distinctive unit; and (4) grouping the elements. The idea is analogous to tracking...

