Results 1 - 10
of
78
Location Recognition using Prioritized Feature Matching
"... Abstract. We present a fast, simple location recognition and image localization method that leverages feature correspondence and geometry estimated from large Internet photo collections. Such recovered structure contains a significant amount of useful information about images and image features that ..."
Abstract
-
Cited by 71 (5 self)
- Add to MetaCart
(Show Context)
Abstract. We present a fast, simple location recognition and image localization method that leverages feature correspondence and geometry estimated from large Internet photo collections. Such recovered structure contains a significant amount of useful information about images and image features that is not available when considering images in isolation. For instance, we can predict which views will be the most common, which feature points in a scene are most reliable, and which features in the scene tend to co-occur in the same image. Based on this information, we devise an adaptive, prioritized algorithm for matching a representative set of SIFT features covering a large scene to a query image for efficient localization. Our approach is based on considering features in the scene database, and matching them to query image features, as opposed to more conventional methods that match image features to visual words or database features. We find this approach results in improved performance, due to the richer knowledge of characteristics of the database features compared to query image features. We present experiments on two large city-scale photo collections, showing that our algorithm compares favorably to image retrieval-style approaches to location recognition.
Landmark Classification in Large-scale Image Collections
"... With the rise of photo-sharing websites such as Facebook and Flickr has come dramatic growth in the number of photographs online. Recent research in object recognition has used such sites as a source of image data, but the test images have been selected and labeled by hand, yielding relatively small ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
(Show Context)
With the rise of photo-sharing websites such as Facebook and Flickr has come dramatic growth in the number of photographs online. Recent research in object recognition has used such sites as a source of image data, but the test images have been selected and labeled by hand, yielding relatively small validation sets. In this paper we study image classification on a much larger dataset of 30 million images, including nearly 2 million of which have been labeled into one of 500 categories. The dataset and categories are formed automatically from geotagged photos from Flickr, by looking for peaks in the spatial geotag distribution corresponding to frequently-photographed landmarks. We learn models for these landmarks with a multiclass support vector machine, using vector-quantized interest point descriptors as features. We also explore the non-visual information available on modern photo-sharing sites, showing that using textual tags and temporal constraints leads to significant improvements in classification rate. We find that in some cases image features alone yield comparable classification accuracy to using text tags as well as to the performance of human observers. 1.
What Makes Paris Look Like Paris?
, 2012
"... Given a large repository of geotagged imagery, we seek to automatically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguish ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
Given a large repository of geotagged imagery, we seek to automatically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering
Worldwide pose estimation using 3d point clouds
- In ECCV
, 2012
"... Abstract. We address the problem of determining where a photo was taken by estimating a full 6-DOF-plus-intrincs camera pose with respect to a large geo-registered 3D point cloud, bringing together research on image localization, landmark recognition, and 3D pose estimation. Our method scales to dat ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
(Show Context)
Abstract. We address the problem of determining where a photo was taken by estimating a full 6-DOF-plus-intrincs camera pose with respect to a large geo-registered 3D point cloud, bringing together research on image localization, landmark recognition, and 3D pose estimation. Our method scales to datasets with hundreds of thousands of images and tens of millions of 3D points through the use of two new techniques: a co-occurrence prior for RANSAC and bidirectional matching of image features with 3D points. We evaluate our method on several large data sets, and show state-of-the-art results on landmark recognition as well as the ability to locate cameras to within meters, requiring only seconds per query. 1
Image Webs: Computing and Exploiting Connectivity in Image Collections
"... The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called I ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called Image Webs to represent such connections. While earlier efforts studied local neighborhoods of such graphs, we are interested in understanding global structure and exploiting connectivity at larger scales. We show how to efficiently construct Image Webs that capture the connectivity in an image collection using spectral graph theory. Our technique can link together tens of thousands of images in a few minutes using a computer cluster. We also demonstrate applications for exploring collections based on global topological analysis.
Retrieving landmark and nonlandmark images from community photo collections
- In ACM Multimedia
, 2010
"... State of the art data mining and image retrieval in community photo collections typically focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. We propose an image clustering scheme that, seen as vector quantization, compresses a large corpus of images by gr ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
(Show Context)
State of the art data mining and image retrieval in community photo collections typically focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. We propose an image clustering scheme that, seen as vector quantization, compresses a large corpus of images by grouping visually consistent ones while providing a guaranteed distortion bound. This allows us, for instance, to represent the visual content of all thousands of images depicting the Parthenon in just a few dozens of scene maps and still be able to retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall. Starting from a geo-tagged dataset, we first group images geographically and then visually, where each visual cluster is assumed to depict different views of the the same scene. We align all views to one reference image and construct a 2D scene map by preserving details from all images while discarding repeating visual features. Our indexing, retrieval and spatial matching scheme then operates directly on scene maps. We evaluate the precision of the proposed method on a challenging one-million urban image dataset.
Scene Reconstruction and Visualization From Community Photo Collections
, 2010
"... Recent progress is described in digitizing and visualizing the world from data captured by people taking photos and uploading them to the web. ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Recent progress is described in digitizing and visualizing the world from data captured by people taking photos and uploading them to the web.
Finding Media Illustrating Events
- In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11
, 2011
"... We present a method combining semantic inferencing and visual analysis for finding automatically media (photos and videos) illustrating events. We report on experiments vali-dating our heuristic for mining media sharing platforms and large event directories in order to mutually enrich the de-scripti ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
(Show Context)
We present a method combining semantic inferencing and visual analysis for finding automatically media (photos and videos) illustrating events. We report on experiments vali-dating our heuristic for mining media sharing platforms and large event directories in order to mutually enrich the de-scriptions of the content they host. Our overall goal is to design a web-based environment that allows users to explore and select events, to inspect associated media, and to dis-cover meaningful, surprising or entertaining connections be-tween events, media and people participating in events. We present a large dataset composed of semantic descriptions of events, photos and videos interlinked with the larger Linked Open Data cloud and we show the benefits of using semantic web technologies for integrating multimedia metadata.
Satellites in Our Pockets: An Object Positioning System Using Smartphones
- In Proc. of MobiSys
, 2012
"... This paper attempts to solve the following problem: can a distant object be localized by looking at it through a smartphone. As an example use-case, while driving on a highway entering New York, we want to look at one of the skyscrapers through the smartphone camera, and compute its GPS location. Wh ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
This paper attempts to solve the following problem: can a distant object be localized by looking at it through a smartphone. As an example use-case, while driving on a highway entering New York, we want to look at one of the skyscrapers through the smartphone camera, and compute its GPS location. While the problem would have been far more difficult five years back, the growing number of sensors on smartphones, combined with advances in computer vision, have opened up important opportunities. We harness these opportunities through a system called Object Positioning System (OPS) that achieves reasonable localization accuracy. Our core technique uses computer vision to create an approximate 3D structure of the object and camera, and applies mobile phone sensors to scale and rotate the structure to its absolute configuration. Then, by solving (nonlinear) optimizations on the residual (scaling and rotation) error, we ultimately estimate the object’s GPS position. We have developed OPS on Android NexusS phones and experimented with localizing 50 objects in the Duke University campus. We believe that OPS shows promising results, enabling a variety of applications. Our ongoing work is focused on coping with large GPS errors, which proves to be the prime limitation of the current prototype.