Results 1 - 10
of
92
Tour the world: Building a web-scale landmark recognition engine
- in: IEEE Conference on Computer Vision and Pattern Recognition, Electronic Proceedings
, 2009
"... Modeling and recognizing landmarks at world-scale is a useful yet challenging task. There exists no readily available list of worldwide landmarks. Obtaining reliable visual models for each landmark can also pose problems, and efficiency is another challenge for such a large scale system. This paper ..."
Abstract
-
Cited by 78 (1 self)
- Add to MetaCart
(Show Context)
Modeling and recognizing landmarks at world-scale is a useful yet challenging task. There exists no readily available list of worldwide landmarks. Obtaining reliable visual models for each landmark can also pose problems, and efficiency is another challenge for such a large scale system. This paper leverages the vast amount of multimedia data on the web, the availability of an Internet image search engine, and advances in object recognition and clustering techniques, to address these issues. First, a comprehensive list of landmarks is mined from two sources: (1) ∼20 million GPS-tagged photos and (2) online tour guide web pages. Candidate images for each landmark are then obtained from photo sharing websites or by querying an image search engine. Second, landmark visual models are built by pruning candidate images using efficient image matching and unsupervised clustering techniques. Finally, the landmarks and their visual models are validated by checking authorship of their member images. The resulting landmark recognition engine incorporates 5312 landmarks from 1259 cities in 144 countries. The experiments demonstrate that the engine can deliver satisfactory recognition performance with high efficiency. 1.
Placing Flickr Photos on a Map
"... In this paper we investigate generic methods for placing photos uploaded to Flickr on the World map. As primary input for our methods we use the textual annotations provided by the users to predict the single most probable location where the image was taken. Central to our approach is a language mod ..."
Abstract
-
Cited by 68 (5 self)
- Add to MetaCart
(Show Context)
In this paper we investigate generic methods for placing photos uploaded to Flickr on the World map. As primary input for our methods we use the textual annotations provided by the users to predict the single most probable location where the image was taken. Central to our approach is a language model based entirely on the annotations provided by users. We define extensions to improve over the language model using tag-based smoothing and cell-based smoothing, and leveraging spatial ambiguity. Further we demonstrate how to incorporate GeoNames 1, a large external database of locations. For varying levels of granularity, we are able to place images on a map with at least twice the precision of the state-of-the-art reported in the literature.
Geographical Topic Discovery and Comparison
"... This paper studies the problem of discovering and comparing geographical topics from GPS-associated documents. GPSassociated documents become popular with the pervasiveness of location-acquisition technologies. For example, in Flickr, the geo-tagged photos are associated with tags and GPS locations. ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
(Show Context)
This paper studies the problem of discovering and comparing geographical topics from GPS-associated documents. GPSassociated documents become popular with the pervasiveness of location-acquisition technologies. For example, in Flickr, the geo-tagged photos are associated with tags and GPS locations. In Twitter, the locations of the tweets can be identified by the GPS locations from smart phones. Many interesting concepts, including cultures, scenes, and product sales, correspond to specialized geographical distributions. In this paper, we are interested in two questions: (1) how to discover different topics of interests that are coherent in geographical regions? (2) how to compare several topics across different geographical locations? To answer these questions, this paper proposes and compares three ways of modeling geographical topics: location-driven model, text-driven model, and a novel joint model called LGTA (Latent Geographical Topic Analysis) that combines location and text. To make a fair comparison, we collect several representative datasets from Flickr website including Landscape, Activity, Manhattan, National park, Festival, Car, and Food. The results show that the first two methods work in some datasets but fail in others. LGTA works well in all these datasets at not only finding regions of interests but also providing effective comparisons of the topics across different locations. The results confirm our hypothesis that the geographical distributions can help modeling topics, while topics provide important cues to group different geographical regions.
Enhancing Diversity, Coverage and Balance for Summarization through Structure Learning
- WWW 2009 MADRID!
, 2009
"... Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, and balance issues which to a large extent determine the quality of summaries. In this paper, we consider extract-based summarization emphasizing the following three requirements: 1) diversity in summarization, which seeks to reduce redundancy among sentences in the summary; 2) sufficient coverage, which focuses on avoiding the loss of the document’s main information when generating the summary; and 3) balance, which demands that different aspects of the document need to have about the same relative importance in the summary. We formulate the extract-based summarization problem as learning a mapping from a set of sentences of a given document to a subset of the sentences that satisfies the above three requirements. The mapping is learned by incorporating several constraints in a structure learning framework, and we explore the graph structure of the output variables and employ structural SVM for solving the resulted optimization problem. Experiments on the DUC2001 data sets demonstrate significant performance improvements in terms of F1 and ROUGE metrics.
Less Talk, More Rock: Automated Organization of Community-Contributed Collections of Concert Videos
- WWW 2009
, 2009
"... We describe a system for synchronization and organization of user-contributed content from live music events. We start with a set of short video clips taken at a single event by multiple contributors, who were using a varied set of capture devices. Using audio fingerprints, we synchronize these clip ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
(Show Context)
We describe a system for synchronization and organization of user-contributed content from live music events. We start with a set of short video clips taken at a single event by multiple contributors, who were using a varied set of capture devices. Using audio fingerprints, we synchronize these clips such that overlapping clips can be displayed simultaneously. Furthermore, we use the timing and link structure generated by the synchronization algorithm to improve the findability and representation of the event content, including identifying key moments of interest and descriptive text for important captured segments of the show. We also identify the preferred audio track when multiple clips overlap. We thus create a much improved representation of the event that builds on the automatic content match. Our work demonstrates important principles in the use of content analysis techniques for social media content on the Web, and applies those principles in the domain of live music capture.
Exploiting Flickr Tags and Groups for Finding Landmark Photos ⋆
"... Abstract. Many people take pictures of different city landmarks and post them to photo-sharing systems like Flickr. They also add tags and place photos in Flickr groups, created around particular themes. Using tags, other people can search for representative landmark images of places of interest. Se ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
(Show Context)
Abstract. Many people take pictures of different city landmarks and post them to photo-sharing systems like Flickr. They also add tags and place photos in Flickr groups, created around particular themes. Using tags, other people can search for representative landmark images of places of interest. Searching for landmarks using tags results into many non-landmark photos and provides poor landmark summary for a city. In this paper we propose a new method to identify landmark photos using tags and social Flickr groups. In contrast to similar modern systems, our approach is also applicable when GPS-coordinates for photos are not available. Presented user study shows that the proposed method outperforms state-of-the-art systems for landmark finding. 1
Mining City Landmarks from Blogs by Graph Modeling 1
"... Recent years have witnessed great prosperity in communitycontributed multimedia. Discovering and summarizing knowledge from these data enables us to make better sense of the world. In this paper, we report our work on mining famous city landmarks from blogs for personalized tourist suggestions. Our ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
(Show Context)
Recent years have witnessed great prosperity in communitycontributed multimedia. Discovering and summarizing knowledge from these data enables us to make better sense of the world. In this paper, we report our work on mining famous city landmarks from blogs for personalized tourist suggestions. Our main contribution is a graph modeling framework to discover city landmarks by mining blog photo correlations with community supervision. This modeling fuses context, content, and community information in a style that simulates static (PageRank) and dynamic (HITS) ranking models to highlight representative data from the consensus of blog users. Preliminary, we identify geographical locations of page contents to harvest city sight photos from Web blogs, based on which we structure these photos into a Scene-View hierarchy * within each city. Our graph modeling consists of two phases: First, within a
Visualizing instagram: Tracing cultural visual rhythms.
- In Proceedings of the workshop on social media visualization (socmedvis) in conjunction with the sixth international AAAI conference on weblogs and social media (ICWSM-12)
, 2012
"... Abstract Picture-taking has never been easier. We now use our phones to snap photos and instantly share them with friends, family and strangers all around the world. Consequently, we seek ways to visualize, analyze and discover concealed sociocultural characteristics and trends in this ever-growing ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
Abstract Picture-taking has never been easier. We now use our phones to snap photos and instantly share them with friends, family and strangers all around the world. Consequently, we seek ways to visualize, analyze and discover concealed sociocultural characteristics and trends in this ever-growing flow of visual information. How do we then trace global and local patterns from the analysis of visual planetary-scale data? What types of insights can we draw from the study of these massive visual materials? In this study we use Cultural Analytics visualization techniques for the study of approximately 550,000 images taken by users of the location-based social photo sharing application Instagram. By analyzing images from New York City and Tokyo, we offer a comparative visualization research that indicates differences in local color usage, cultural production rate, and varied hue's intensitiesall form a unique, local, 'Visual Rhythm': a framework for the analysis of location-based visual information flows.
Scene Reconstruction and Visualization From Community Photo Collections
, 2010
"... Recent progress is described in digitizing and visualizing the world from data captured by people taking photos and uploading them to the web. ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Recent progress is described in digitizing and visualizing the world from data captured by people taking photos and uploading them to the web.
Not all tags are created equal: Learning Flickr tag semantics for global annotation
- in Proceedings of IEEE International Conference on Multimedia & Expo
, 2009
"... Large collaborative datasets offer the challenging opportunity of creating systems capable of extracting knowledge in the presence of noisy data. In this work we explore the ability to automatically learn tag semantics by mining a global georeferenced image collection crawled from Flickr with the ai ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
(Show Context)
Large collaborative datasets offer the challenging opportunity of creating systems capable of extracting knowledge in the presence of noisy data. In this work we explore the ability to automatically learn tag semantics by mining a global georeferenced image collection crawled from Flickr with the aim of improving an automatic annotation system. We are able to categorize sets of tags as places, landmarks, and visual descriptors. By organizing our dataset of more than 1.69 million images using a quadtree we can efficiently find geographic areas with sufficient density to provide useful results for place and landmark extraction. Precision-recall curves for our techniques compared with previous existing work used to identify place tags and manual groundtruth landmark annotation show the merit of our methods applied on a world scale.